training self sueprvised learning on multi-gpu

Hi, I am really new to self supervised learning. I would like to know how do you adjust no.of epochs when you train on multi-gpu.

I am intending to train on ImageNet. Unfortunately, I cannot fit mini-batch of 256 in one gpu. As for as I understand, when I train on 4-gpu, even if I train for 100 epochs, the no.of epochs trained will be effectively 100/4=25 as number of gradient updates will also be divided by 4. Please correct me if I am wrong.

Please let me know how have you accounted for that in your implementation.

Thanks

Philip-Bachman / amdim-public

training self sueprvised learning on multi-gpu #16