reproducibility versus replicability

bigscience-workshop / t-zero

Reproduce results and replicate training fo T0 (Multitask Prompted Training Enables Zero-Shot Task Generalization)

Apache License 2.0

457 stars 53 forks source link

Great to see this implemented. Definitions of reproducibility and replication differ across domains, and would probably be helpful to clarify these in a few places. Happy to add these. In probably the most accepted definition now reproducibility would mean the use of the same data sets, techniques, scripts, and framework by independent researchers to obtain the same results. Replication in this setting is a bit tricky though—would the only difference be in the implemented framework, TensorFlow versus PyTorch? Are there other underlying differences between the two frameworks, which may contribute to differences also in how the model is trained or in the results, depending on whether model training or results are replicated?

thanks for raising that point @valdanchev !

Replication in this setting is a bit tricky though—would the only difference be in the implemented framework, TensorFlow versus PyTorch? Are there other underlying differences between the two frameworks, which may contribute to differences also in how the model is trained or in the results, depending on whether model training or results are replicated?

The main differences for the replication of the training will be:

framework: tf vs pytorch
optimizer: adafactor vs adam
data processing: example packing vs no example packing

(under the folder evaluation, I used "reproduce" but as you noted, a better term would be "replicate")

bigscience-workshop / t-zero

reproducibility versus replicability #7