facebookresearch / deit

Official DeiT repository
Apache License 2.0
3.94k stars 547 forks source link

batch_size flag #220

Open tsengalb99 opened 1 year ago

tsengalb99 commented 1 year ago

Is the batch_size flag the batch size per GPU or the total batch size across all GPUs? In the example training command, you use 4 GPUs and a batch size of 256. Does this mean the effective batch size is 1024 or 256 with 64 per GPU? I am unable to reproduce the DeiT-Ti results (~62.5% @ 250 epochs, I highly doubt it will hit 72% @ 300 epochs) using either 8 GPUs and batch_size=128 or 4 GPUs and batch_size=256. I was under the assumption that both would give me identical results equivalent to a batch size of 1024, but it seems like something is broken here.

tsengalb99 commented 1 year ago

@TouvronHugo

roymiles commented 11 months ago

Were you able to solve this problem?