bigscience-workshop / t-zero

Reproduce results and replicate training fo T0 (Multitask Prompted Training Enables Zero-Shot Task Generalization)
Apache License 2.0
457 stars 53 forks source link

Training duration #26

Closed kevinpl07 closed 2 years ago

kevinpl07 commented 2 years ago

Hello,

in the example training script, the number of training steps is given as 1112200.

Is this number what has been used in training? And is it possible to give an estimate about the complete training duration or the processed steps per second?

Thanks in advance!

VictorSanh commented 2 years ago

hi @kevinpl07

The model was trained for 12'200 steps on top of T5 LM (which was trained for 100'000 steps with lm loss on top of T5's 1'000'000 pre-trainign steps).

These 12'000 are very much dependent of how beefy of a TPU we used. It could take less than a day (for a large number of TPUs), or a few days (small number of tpus)