train_it.sh do not match the paper

linhaojia13 commented 1 year ago

Specifically, in train_it.sh, it shows micro_batch_size=4, nproc_per_node=8, nnodes=1, and gradient_accumulation_steps=1, which results in a global_batch_size of 32, rather than the 256 mentioned in the paper.

To replicate the results from the paper, should I adjust gradient_accumulation_steps and micro_batch_size to align with the global_batch_size mentioned in the paper, or should I directly use the train_it.sh script that you have released?

MAGAer13 commented 1 year ago

We just give a sample of code how it works.

Zhoues commented 1 year ago

Specifically, in train_it.sh, it shows micro_batch_size=4, nproc_per_node=8, nnodes=1, and gradient_accumulation_steps=1, which results in a global_batch_size of 32, rather than the 256 mentioned in the paper.

To replicate the results from the paper, should I adjust gradient_accumulation_steps and micro_batch_size to align with the global_batch_size mentioned in the paper, or should I directly use the train_it.sh script that you have released?

Sorry to bother youmy good friend. Do you understand how it works? Now I am also faced with this problem 😢 .

X-PLUG / mPLUG-Owl

train_it.sh do not match the paper #110