Closed kevinpl07 closed 2 years ago
hi @kevinpl07
The model was trained for 12'200 steps on top of T5 LM (which was trained for 100'000 steps with lm loss on top of T5's 1'000'000 pre-trainign steps).
These 12'000 are very much dependent of how beefy of a TPU we used. It could take less than a day (for a large number of TPUs), or a few days (small number of tpus)
Hello,
in the example training script, the number of training steps is given as 1112200.
Is this number what has been used in training? And is it possible to give an estimate about the complete training duration or the processed steps per second?
Thanks in advance!