HPDL-Group / Merak

Apache License 2.0
69 stars 9 forks source link

[per_device_train_batch_size] argument cause misunderstanding #5

Closed lin88lin8850 closed 1 year ago

lin88lin8850 commented 1 year ago
image

As we know, per_device_train_batch_size = micro_batch_size * gradient_accumulation_steps,

but in the code: https://github.com/HPDL-Group/Merak/blob/main/Merak/merak_trainer.py#L316 you set micro_batch_size = per_device_batch_size,which could cause misunderstanding

Could u mind revise the parameter name?

lucasleesw commented 1 year ago

Hi, thanks for pointing this. The per_device_train_batch_size and gradient_accumulation_steps comes from the transformers trainer arguments. We think the gradient accumulation in DP is similar to micro_batch training in PP, so we use per_device_train_batch_size as the microbatch size to ensure the global batch size is remains the same when switched from DP training script.

lin88lin8850 commented 1 year ago

Got it! adding some related information to doc could be better. Issue closed