Open AlexNmSED opened 1 year ago
When I use 4 GPUS in single machine , I meet this question: runtimeerror: [/pytorch/third_party/gloo/gloo/transport/tcp/pair.cc:575] connectruntclosed by peer [172.16.173.129]:23211
Someone can help me ?
Thank you .
try this: python -m torch.distributed.launch --nproc_per_node=4 main_pretrain.py
Thank you. But that's what I do.
When I use 4 GPUS in single machine , I meet this question: runtimeerror: [/pytorch/third_party/gloo/gloo/transport/tcp/pair.cc:575] connectruntclosed by peer [172.16.173.129]:23211
Someone can help me ?
Thank you .