Closed DayuanJiang closed 4 years ago
For the ELECTRA Small, the trained steps could be 4e6.
Because when I tested with num_train_steps =< 4e6, the model was not trained (because it was already trained with that number of steps). And it started to be trained with num_trained_steps >= 4000001.
Hello, I am trying to further pretrain the base model and large model use domain-specific corpus. But I see in the document, it says that when continuing pre-training from the released small ELECTRA checkpoints, we should:
But Table 6 of the paper shows that small ELECTRA model is trained for 1M steps. Which one should we set?
If 4e6 is correct, how many steps has the base model or large model been trained?