aim-uofa / AdelaiDet

AdelaiDet is an open source toolbox for multiple instance-level detection and recognition tasks.
https://git.io/AdelaiDet
Other
3.37k stars 646 forks source link

RuntimeError: Default process group has not been initialized, please make sure to call init_process_group. (Fcpose) #524

Closed anas-zafar closed 2 years ago

anas-zafar commented 2 years ago

When I use single GPU for training FcPose I get the error below:

  File "C:\ProgramData\Anaconda3\envs\new\lib\site-packages\torch\distributed\distributed_c10d.py", line 711, in get_world_size
    return _get_group_size(group)
  File "C:\ProgramData\Anaconda3\envs\new\lib\site-packages\torch\distributed\distributed_c10d.py", line 263, in _get_group_size
    default_pg = _get_default_group()
  File "C:\ProgramData\Anaconda3\envs\new\lib\site-packages\torch\distributed\distributed_c10d.py", line 347, in _get_default_group
    raise RuntimeError("Default process group has not been initialized, "
RuntimeError: Default process group has not been initialized, please make sure to call init_process_group.

I tried using this solution https://github.com/aim-uofa/AdelaiDet/issues/503 , #503 but unfortunately it does work for me

anas-zafar commented 2 years ago

For windows the following solution works:

cuda_num = os.environ['CUDA_VISIBLE_DEVICES']
cuda_num_list = list(cuda_num.split(","))
if len(cuda_num_list) == 1:
import torch.distributed as dist
dist.init_process_group(backend='nccl', init_method='tcp://localhost:23456', rank=0, world_size=1)

Instead of nccl use gloo as nccl is not supported on windows dist.init_process_group(backend='gloo', init_method='tcp://localhost:23456', rank=0, world_size=1)