facebookresearch / detr

End-to-End Object Detection with Transformers
Apache License 2.0
13.43k stars 2.42k forks source link

AttributeError: module 'torch.distributed' has no attribute 'init_process_group' #43

Closed ewong18 closed 4 years ago

ewong18 commented 4 years ago

I'm trying to run the example as-is, and i'm running into this issue. I did have to adjust the number of gpus because the VM I'm working on only has 1. I'm also working on a Windows 10 machine with pytorch version 1.5.0, CUDA version 10.1, and CUDA compiler driver v10.0.130.


| distributed init (rank 0): env://
Traceback (most recent call last):
  File "main.py", line 248, in <module>
    main(args)
  File "main.py", line 106, in main
    utils.init_distributed_mode(args)
  File "C:\Users\-user-\Documents\Projects\detr\util\misc.py", line 374, in init_distributed_mode
    torch.distributed.init_process_group(backend=args.dist_backend, init_method=args.dist_url,
AttributeError: module 'torch.distributed' has no attribute 'init_process_group'
Traceback (most recent call last):
  File "C:\Anaconda\envs\detr\lib\runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Anaconda\envs\detr\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Anaconda\envs\detr\lib\site-packages\torch\distributed\launch.py", line 263, in <module>
    main()
  File "C:\Anaconda\envs\detr\lib\site-packages\torch\distributed\launch.py", line 258, in main
    raise subprocess.CalledProcessError(returncode=process.returncode,
subprocess.CalledProcessError: Command '['C:\\Anaconda\\envs\\detr\\python.exe', '-u', 'main.py', '--coco_path', 'F:/coco-data']' returned non-zero exit status 1.```
alcinos commented 4 years ago

Hi, If you have only one gpu, you should remove the distributed launching entirely: python main.py --coco_path /path/to/coco

This will train with a total batch size of 2, which is currently untested/unsupported (we recommend at least bs=16, which is unlikely to fit on a single gpu)

fmassa commented 4 years ago

Closing following @alcinos answer, but let us know if you have further issues / questions.