facebookresearch / moco

PyTorch implementation of MoCo: https://arxiv.org/abs/1911.05722
MIT License
4.83k stars 794 forks source link

RuntimeError: No rendezvous handler for :// #58

Closed ming1993li closed 4 years ago

ming1993li commented 4 years ago

How should I set the "dist-url"? Thank you!

ppwwyyxx commented 4 years ago

An example is provided in the readme: https://github.com/facebookresearch/moco

9p15p commented 4 years ago

I encountered the same problem. torch:1.6.1-cu101

/home/ldz/anaconda3/bin/python /home/ldz/temp_project/moco/main_moco.py -a resnet50 --lr 0.03 --batch-size 256 --dist-url 'tcp://10.7.57.163:10001' --multiprocessing-distributed --world-size 1 --rank 0 mini-imagenet
Use GPU: 0 for training
Traceback (most recent call last):
  File "/home/ldz/temp_project/moco/main_moco.py", line 402, in <module>
    main()
  File "/home/ldz/temp_project/moco/main_moco.py", line 130, in main
    mp.spawn(main_worker, nprocs=ngpus_per_node, args=(ngpus_per_node, args))
  File "/home/ldz/anaconda3/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 200, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "/home/ldz/anaconda3/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 158, in start_processes
    while not context.join():
  File "/home/ldz/anaconda3/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 119, in join
    raise Exception(msg)
Exception: 

-- Process 0 terminated with the following error:
Traceback (most recent call last):
  File "/home/ldz/anaconda3/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 20, in _wrap
    fn(i, *args)
  File "/home/ldz/temp_project/moco/main_moco.py", line 156, in main_worker
    world_size=args.world_size, rank=args.rank)
  File "/home/ldz/anaconda3/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 420, in init_process_group
    init_method, rank, world_size, timeout=timeout
  File "/home/ldz/anaconda3/lib/python3.7/site-packages/torch/distributed/rendezvous.py", line 79, in rendezvous
    raise RuntimeError("No rendezvous handler for {}://".format(result.scheme))
RuntimeError: No rendezvous handler for ://

Process finished with exit code 1
ZRJMoon commented 3 years ago

I encountered the same problem. torch:1.6.1-cu101

/home/ldz/anaconda3/bin/python /home/ldz/temp_project/moco/main_moco.py -a resnet50 --lr 0.03 --batch-size 256 --dist-url 'tcp://10.7.57.163:10001' --multiprocessing-distributed --world-size 1 --rank 0 mini-imagenet
Use GPU: 0 for training
Traceback (most recent call last):
  File "/home/ldz/temp_project/moco/main_moco.py", line 402, in <module>
    main()
  File "/home/ldz/temp_project/moco/main_moco.py", line 130, in main
    mp.spawn(main_worker, nprocs=ngpus_per_node, args=(ngpus_per_node, args))
  File "/home/ldz/anaconda3/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 200, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "/home/ldz/anaconda3/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 158, in start_processes
    while not context.join():
  File "/home/ldz/anaconda3/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 119, in join
    raise Exception(msg)
Exception: 

-- Process 0 terminated with the following error:
Traceback (most recent call last):
  File "/home/ldz/anaconda3/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 20, in _wrap
    fn(i, *args)
  File "/home/ldz/temp_project/moco/main_moco.py", line 156, in main_worker
    world_size=args.world_size, rank=args.rank)
  File "/home/ldz/anaconda3/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 420, in init_process_group
    init_method, rank, world_size, timeout=timeout
  File "/home/ldz/anaconda3/lib/python3.7/site-packages/torch/distributed/rendezvous.py", line 79, in rendezvous
    raise RuntimeError("No rendezvous handler for {}://".format(result.scheme))
RuntimeError: No rendezvous handler for ://

Process finished with exit code 1

I encountered the same problem, too. Have you already saved it?

jiaduob commented 1 year ago

I encountered the same problem ‘RuntimeError: No rendezvous handler for ://’, too. How did you save it?