Open lucadiliello opened 3 years ago
In the paper you declare BERT-large is trained with a batch_size of 2048 for 464K steps but in the compute_flops.py script you use the same train args as BERT-base. Is this a mistake?
2048
compute_flops.py
In the paper you declare BERT-large is trained with a batch_size of
2048
for 464K steps but in thecompute_flops.py
script you use the same train args as BERT-base. Is this a mistake?