Closed imhgchoi closed 1 year ago
Hi and thank you for your interest.
My apologies for noticing this issue so late. Yes, that is correct, CIFAR and most other small datasets were trained on a single GPU and without DDP, therefore the batch size in the config file should reflect the final batch size. ImageNet, and most of the fine-tuned experiments, were distributed, therefore the batch size would be multiplied by 8 (number of gpus).
No worries :) Thank you very much for your reply.
Hi, this work is awesome. I just have one little question. The paper says the total batch size is 128 for CIFAR's and 4 GPU's were used in parallel. That doesn't mean the total batch size is 128 * 4 = 512, does it? DDP is for Imagenet, and non-distributed is for CIFAR, am I correct?
Thanks a ton :)