`num_train_steps` for further pretraining

google-research / electra

ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators

Apache License 2.0

2.33k stars 352 forks source link

Hello, I am trying to further pretrain the base model and large model use domain-specific corpus. But I see in the document, it says that when continuing pre-training from the released small ELECTRA checkpoints, we should:

Setting num_train_steps by (for example) adding "num_train_steps": 4010000 to the --hparams. This will continue training the small model for 10000 more steps (it has already been trained for 4e6 steps).

But Table 6 of the paper shows that small ELECTRA model is trained for 1M steps. Which one should we set?

If 4e6 is correct, how many steps has the base model or large model been trained?

google-research / electra

`num_train_steps` for further pretraining #44