Question about expected results

google-research / electra

ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators

Apache License 2.0

2.31k stars 351 forks source link

Hi @clarkkev ,

How long did you train ELECTRA-Small OWT In the expected result section of READEME.md, you have mentioned "OWT is the OpenWebText-trained model from above (it performs a bit worse than ELECTRA-Small due to being trained for less time and on a smaller dataset)". How may steps have you trained ? And AFAIK openwebtext should be larger than wikibook, is that mean you use only part of the data ?
How come the scores in expected results You have also mentioned "The below scores show median performance over a large number of random seeds.", is that mean the scores listed in that section is the scores of models pretrained from scractch with random seeds and each model was finetuned for 10 runs with random seeds, or is one pretrained model and finetuned for 10 runs with many random seeds ?
Did you use double_unordered in training models for expected results ?

google-research / electra