Closed DianCh closed 3 years ago
Seems to have posted on the deprecated detectron. Closed and created another one under detectron2.
Hi @DianCh , I am facing similar issue. Could you please share the detectron2 link wherever you have created this issue.
Hi @DianCh , I am facing similar issue. Could you please share the detectron2 link wherever you have created this issue.
Sure. https://github.com/facebookresearch/detectron2/issues/2792
Thanks a lot @DianCh
Expected results
I expect that after running two launch commands on two machine respectively, the two node will start training and communicating with each other. Also, I expect to see 4 GPUs taken by each machine as specified by
--num-gpus 4
in both commands.Actual results
--num-gpus 4
in the command.Terminal output of machine (node) 0:
Terminal output of machine (node) 1:
GPU usage of machine (node) 0 by
nvidia-smi
(seems correct):GPU usage of machine (node) 1 by
nvidia-smi
(TAKES DOUBLED GPUS):Detailed steps to reproduce
pip install -e detectron2
System information
Please see the output of machine 0, which has
detectron2
's diagnosis.NOTE: Both machines are actually containers on clusters, not sure if this affects the network or any multi-machine behaviors.