Closed anas-zafar closed 2 years ago
For windows the following solution works:
cuda_num = os.environ['CUDA_VISIBLE_DEVICES']
cuda_num_list = list(cuda_num.split(","))
if len(cuda_num_list) == 1:
import torch.distributed as dist
dist.init_process_group(backend='nccl', init_method='tcp://localhost:23456', rank=0, world_size=1)
Instead of nccl use gloo as nccl is not supported on windows
dist.init_process_group(backend='gloo', init_method='tcp://localhost:23456', rank=0, world_size=1)
When I use single GPU for training FcPose I get the error below:
I tried using this solution https://github.com/aim-uofa/AdelaiDet/issues/503 , #503 but unfortunately it does work for me