BatsResearch / csp

Learning to compose soft prompts for compositional zero-shot learning.
BSD 3-Clause "New" or "Revised" License
83 stars 6 forks source link

Batch Size #15

Closed Chefzz closed 1 year ago

Chefzz commented 1 year ago

What is the batch size for three datasets? Appendix H shows that the best batch size for all three datasets is 128. Training model on ut-zappos with batch size of 128 almost takes two RTX 3090. For mit-states, i can train model with batch size of 64 by two RTX 3090(17GB memory per card is used). For cgqa, i have to set the batch size to 32.

nihalnayak commented 1 year ago

We achieve this effective batch size of 128 by using --gradient_accumulation_steps in train.py. For example, You can use --train_batch_size=64 --gradient_accumulation_steps=2 to get an effective batch size of train_batch_size * gradient_accumulation_steps, i.e., 128.

Hope this helps!

Chefzz commented 1 year ago

Thanks a lot! So, the --gradient_accumulation_steps in training model on C-GQA is bigger than the other datasets?

nihalnayak commented 1 year ago

Yes! We used more gradient accumulations steps for C-GQA.

Chefzz commented 1 year ago

An excellent work! Thank you again!