RuntimeError: No rendezvous handler for env://

Lee-Siyoung commented 1 year ago

I only use one gpu.

window
NVIDIA GeForce RTX 3070
torch version : 1.7.1+cu110
cuda : 11.2

torch.cuda.is_available() is True

train command python -m torch.distributed.launch --nproc_per_node 1 --master_port 9527 train.py --data data/coco_kpts_samw.yaml --cfg cfg/yolov7_samw.yaml --weights weights/yolov7-w6-person.pt --batch-size 128 --img 960 --kpt-label --sync-bn --device 0 --name yolov7-w6-pose --hyp data/hyp.pose.yaml

error photo

D:\yolov7-pose>python -m torch.distributed.launch --nproc_per_node 1 --master_port 9527 train.py --data data/coco_kpts_samw.yaml --cfg cfg/yolov7_samw.yaml --weights weights/yolov7-w6-person.pt --batch-size 128 --img 960 --kpt-label --sync-bn --device 0 --name yolov7-w6-pose --hyp data/hyp.pose.yaml github: skipping check (not a git repository) YOLOv5 2022-8-12 torch 1.7.1+cu110 CUDA:0 (NVIDIA GeForce RTX 3070, 8191.5MB)

Traceback (most recent call last): File "D:\yolov7-pose\train.py", line 546, in dist.init_process_group(backend='nccl', init_method='env://') # distributed backend File "C:\Users\tldud\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\distributed\distributed_c10d.py", line 433, in init_process_group rendezvous_iterator = rendezvous( File "C:\Users\tldud\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\distributed\rendezvous.py", line 82, in rendezvous raise RuntimeError("No rendezvous handler for {}://".format(result.scheme)) RuntimeError: No rendezvous handler for env:// Traceback (most recent call last): File "C:\Users\tldud\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\tldud\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 87, in _run_code exec(code, run_globals) File "C:\Users\tldud\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\distributed\launch.py", line 260, in main() File "C:\Users\tldud\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\distributed\launch.py", line 255, in main raise subprocess.CalledProcessError(returncode=process.returncode, subprocess.CalledProcessError: Command '['C:\Users\tldud\AppData\Local\Programs\Python\Python39\python.exe', '-u', 'train.py', '--local_rank=0', '--data', 'data/coco_kpts_samw.yaml', '--cfg', 'cfg/yolov7_samw.yaml', '--weights', 'weights/yolov7-w6-person.pt', '--batch-size', '128', '--img', '960', '--kpt-label', '--sync-bn', '--device', '0', '--name', 'yolov7-w6-pose', '--hyp', 'data/hyp.pose.yaml']' returned non-zero exit status 1.

I don't know how to solve this error, please help me 😥

rohanpatankar926 commented 1 year ago

Hi @Lee-Siyoung

trainer = pl.Trainer(gpus = -1,
                     accelerator='ddp',
                     check_val_every_n_epoch=10, 
                    # precision=16,
                    # auto_scale_batch_size='binsearch',
                     callbacks=[checkpoint_callback],
                     max_epochs = 1)

I hope your trainer code looks like this after trainer.fit(model) you're getting RuntimeError: No rendezvous handler for env:// Because you are on Windows. accelerator='ddp' will not work on windows, you have to choose 'dp'. I think it will work.. Try it and let me know. Thankyou :)

Lee-Siyoung commented 1 year ago

Thank you for your answer. Can you tell me which file that code is in? I looked it up, but it wasn't there...😢

rohanpatankar926 commented 1 year ago

@Lee-Siyoung Can u share me ur git link code for the project so that I can better understand

Lee-Siyoung commented 1 year ago

@rohanpatankar926 rohanpatankar926I didn't create git separately because I only changed the yaml here in git yolov7-pose. When I tried using colab, I solved the above error. Do you know what to do if you want to do more than 17 key points? I know that this git is hard-coded with 17.

SubhiH commented 1 year ago

@rohanpatankar926 Thanks for your reply! In which file can we change the accelerator? I mean in Yolov7 project, where can I change the accelerator to run successfully in Windows?

WongKinYiu / yolov7

RuntimeError: No rendezvous handler for env:// #1345