Running multi-gpu on one node

I am using 2 GPUs on one node. I use the following command but the code gets stuck inside the init_distributed_mode() function in util/misc.py. Specifically, it gets stuck at the line: torch.distributed.barrier(), with no further progress, just hangs. And I am using the distributed sampler. How to fix this issue?

CUDA_VISIBLE_DEVICES=2,3 OMP_NUM_THREADS=1 python -m torch.distributed.launch --nproc_per_node=2 main_pretrain.py \ --batch_size 64 \ --model mae_vit_base_patch16 \ --norm_pix_loss \ --mask_ratio 0.75 \ --epochs 200 \ --warmup_epochs 0 \ --blr 1.5e-4 --weight_decay 0.05 \ --data_path ${IMAGENET_DIR}

facebookresearch / mae

Running multi-gpu on one node #120