Closed ekurtulus closed 1 year ago
This depends, of course, on the variant that is running, for c5-o3
, a run on our setup with the A4000 takes on average ~245000 (micro-batch) steps.
Because all systems are built a bit different, I don't think this is such a meaningful number by itself. For benchmarking, my suggestion would always be to rerun the baselines on your system and target budget, and to compare any changes you make to that baseline running on the same system.
Let me know if other questions come up!
Thank you very much for your answer ! I have another question. The tokenizer used for the trained BERT models are bert-x-cased where x is either base or large, right ?
Hi, do you mean the baseline, pretrained BERT models?
The baseline comparison (e.g. in Table.3, row 1) is to bert-base-uncased
. If I remember correctly, this was a tiny bit better than bert-base-cased
in this evaluation.
Closing this for now.
I am asking this for benchmarking purposes. In the config files, it is stated that training lasts 600_000 micro-batch steps and is terminated in 1 day if it does not reach it. How many training steps are actually taken using an RTX-A4000 in a day ?