Thanks for the great repo.
I notice that the readme file only give a suggestion on training on one node with 8 gpus.
So I wonder how does the code support for multi-nodes, such as:
For node 0
python main_moco.py \
-a resnet50 \
--lr 0.03 \
--batch-size 256 \
--dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 2 --rank 0 \
[your imagenet-folder with train and val folders]
For node 1
python main_moco.py \
-a resnet50 \
--lr 0.03 \
--batch-size 256 \
--dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 2 --rank 1 \
[your imagenet-folder with train and val folders]
Hi, have you successfully implemented multi-node version of moco? I meet some problem when running with multi-node.
The buffer of queue seems not synchronized and the variable ptr is not 0 at the beginning.
Thanks for the great repo. I notice that the readme file only give a suggestion on training on one node with 8 gpus. So I wonder how does the code support for multi-nodes, such as:
For node 0
python main_moco.py \ -a resnet50 \ --lr 0.03 \ --batch-size 256 \ --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 2 --rank 0 \ [your imagenet-folder with train and val folders]
For node 1
python main_moco.py \ -a resnet50 \ --lr 0.03 \ --batch-size 256 \ --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 2 --rank 1 \ [your imagenet-folder with train and val folders]