facebookresearch / mae

PyTorch implementation of MAE https//arxiv.org/abs/2111.06377
Other
6.93k stars 1.17k forks source link

Single machine multi-GPU training #159

Open AlexNmSED opened 1 year ago

AlexNmSED commented 1 year ago

When I use 4 GPUS in single machine , I meet this question: runtimeerror: [/pytorch/third_party/gloo/gloo/transport/tcp/pair.cc:575] connectruntclosed by peer [172.16.173.129]:23211

Someone can help me ?

Thank you .

zengjunjie1026 commented 1 year ago

try this: python -m torch.distributed.launch --nproc_per_node=4 main_pretrain.py

AlexNmSED commented 1 year ago

Thank you. But that's what I do.