facebookresearch / moco

PyTorch implementation of MoCo: https://arxiv.org/abs/1911.05722
MIT License
4.83k stars 794 forks source link

Different model parameters initialized in each GPU worker #8

Closed TengdaHan closed 4 years ago

TengdaHan commented 4 years ago

Hi, Following the instructions for both 'Unsupervised Training' and 'Linear Classification', I find different model parameters are initialized in each GPU worker. Because random seed is not set inside main_worker function. For pytorch DistributedDataParallel, do you think initializing the same set of model parameters across all GPU workers could give more accurate gradient and better performance? Thanks!

ppwwyyxx commented 4 years ago

DistributedDataParallel already makes sure parameters across all GPUs are initialized to the same.

TengdaHan commented 4 years ago

Thanks, you are right.