State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.
12.93k
stars
3.12k
forks
source link
[BERT/TF2] Global batch size not matching with the description #1378
Hi, first of all this is such a great work and a very thorough documentation.
I just would like to ask a simple question. In the documentation of BERT/TF2, it is said that the global batch size is set to 61k (I assume it's rounded) for phase 1, and 30k for phase 2 training. However, if my understanding is correct
global_batch_size = batch_size num_gpu num_accumulation_steps
which if I use the described default parameter (60 64 8 = 30720 for phase 1, and 10 192 8 for phase 2 = 15360), which is exactly half of the set global batch size. Did I miss something here? or is there really a mistake?
Hi, first of all this is such a great work and a very thorough documentation.
I just would like to ask a simple question. In the documentation of BERT/TF2, it is said that the global batch size is set to 61k (I assume it's rounded) for phase 1, and 30k for phase 2 training. However, if my understanding is correct global_batch_size = batch_size num_gpu num_accumulation_steps which if I use the described default parameter (60 64 8 = 30720 for phase 1, and 10 192 8 for phase 2 = 15360), which is exactly half of the set global batch size. Did I miss something here? or is there really a mistake?
Thanks in advance.