Open divastar opened 3 years ago
I have the same question
I have the same problem on windows10
python -m torch.distributed.launch xxx.py
use the above command to run this .py file on cmd window.
windows systerm does not support ddp, just comment out "accelerator="ddp"" in train.py
also,
windows systerm does not support ddp, just comment out "accelerator="ddp"" in train.py
Also, I found the configuration file is incomplete, and the input_size does not correspond can lead to this situation.
Hi. I am on windows 10 How can I solve the: RuntimeError: No rendezvous handler for env:// problem?
Traceback (most recent call last): File "train.py", line 299, in
torch.distributed.init_process_group(backend='nccl',
File "C:\Users\korin\anaconda3\envs\myenv\lib\site-packages\torch\distributed\distributed_c10d.py", line 433, in init_process_group
rendezvous_iterator = rendezvous(
File "C:\Users\korin\anaconda3\envs\myenv\lib\site-packages\torch\distributed\rendezvous.py", line 82, in rendezvous
raise RuntimeError("No rendezvous handler for {}://".format(result.scheme))
RuntimeError: No rendezvous handler for env://
Traceback (most recent call last):
File "train.py", line 298, in
torch.cuda.set_device(args.local_rank)
File "C:\Users\korin\anaconda3\envs\myenv\lib\site-packages\torch\cuda__init__.py", line 263, in set_device
torch._C._cuda_setDevice(device)
RuntimeError: CUDA error: invalid device ordinal
Traceback (most recent call last):
File "C:\Users\korin\anaconda3\envs\myenv\lib\runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\korin\anaconda3\envs\myenv\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "C:\Users\korin\anaconda3\envs\myenv\lib\site-packages\torch\distributed\launch.py", line 260, in
main()
File "C:\Users\korin\anaconda3\envs\myenv\lib\site-packages\torch\distributed\launch.py", line 255, in main
raise subprocess.CalledProcessError(returncode=process.returncode,
subprocess.CalledProcessError: Command '['C:\Users\korin\anaconda3\envs\myenv\python.exe', '-u', 'train.py', '--local_rank=1', '--dataset', 'cityscapes', '--cv', '0', '--syncbn', '--apex', '--fp16', '--bs_val', '1', '--eval', 'folder', '--eval_folder', './imgs/test_imgs', '--dump_assets', '--dump_all_images', '--n_scales', '0.5,1.0,2.0', '--snapshot', 'ASSETS_PATH/seg_weights/cityscapes_ocrnet.HRNet_Mscale_outstanding-turtle.pth', '--arch', 'ocrnet.HRNet_Mscale', '--result_dir', 'logs\dump_folder\singing-earwig_2021.02.21_07.56']' returned non-zero exit status 1.