google-research / electra

ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
Apache License 2.0
2.31k stars 351 forks source link

Typo in BERT Large flops computation #115

Open lucadiliello opened 3 years ago

lucadiliello commented 3 years ago

In the paper you declare BERT-large is trained with a batch_size of 2048 for 464K steps but in the compute_flops.py script you use the same train args as BERT-base. Is this a mistake?