how to choose optimal training step, given a custom training dataset.

bertin-project / bertin-t5x

BERTIN Project T5X training files

Apache License 2.0

3 stars 2 forks source link

@versae I had a doubt regarding how much training steps to train the model, given a custom training datasets.

Currently i am training T5_1_1 on hindi language, and i current have dataset of 20GB 60M+ samples. but on training for 500k steps on batch_size of 64, the trainer says its training for 250 epochs.

(not sure on how the math in trainer works for estimating the epochs. as i have 60M+ samples training on batch_size 64, How could it readch 250 epochs in 500k steps ? )

could you please tell me on how much epochs is it ideal and recommended to train the model. (i have seen in t5x repo people reporting bad downstream task performance after the model was trained too long)

bertin-project / bertin-t5x

how to choose optimal training step, given a custom training dataset. #2