Closed Chefzz closed 1 year ago
We achieve this effective batch size of 128 by using --gradient_accumulation_steps
in train.py. For example, You can use --train_batch_size=64 --gradient_accumulation_steps=2
to get an effective batch size of train_batch_size * gradient_accumulation_steps
, i.e., 128.
Hope this helps!
Thanks a lot! So, the --gradient_accumulation_steps in training model on C-GQA is bigger than the other datasets?
Yes! We used more gradient accumulations steps for C-GQA.
An excellent work! Thank you again!
What is the batch size for three datasets? Appendix H shows that the best batch size for all three datasets is 128. Training model on ut-zappos with batch size of 128 almost takes two RTX 3090. For mit-states, i can train model with batch size of 64 by two RTX 3090(17GB memory per card is used). For cgqa, i have to set the batch size to 32.