Closed ming1993li closed 4 years ago
An example is provided in the readme: https://github.com/facebookresearch/moco
I encountered the same problem. torch:1.6.1-cu101
/home/ldz/anaconda3/bin/python /home/ldz/temp_project/moco/main_moco.py -a resnet50 --lr 0.03 --batch-size 256 --dist-url 'tcp://10.7.57.163:10001' --multiprocessing-distributed --world-size 1 --rank 0 mini-imagenet
Use GPU: 0 for training
Traceback (most recent call last):
File "/home/ldz/temp_project/moco/main_moco.py", line 402, in <module>
main()
File "/home/ldz/temp_project/moco/main_moco.py", line 130, in main
mp.spawn(main_worker, nprocs=ngpus_per_node, args=(ngpus_per_node, args))
File "/home/ldz/anaconda3/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 200, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/home/ldz/anaconda3/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 158, in start_processes
while not context.join():
File "/home/ldz/anaconda3/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 119, in join
raise Exception(msg)
Exception:
-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/home/ldz/anaconda3/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 20, in _wrap
fn(i, *args)
File "/home/ldz/temp_project/moco/main_moco.py", line 156, in main_worker
world_size=args.world_size, rank=args.rank)
File "/home/ldz/anaconda3/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 420, in init_process_group
init_method, rank, world_size, timeout=timeout
File "/home/ldz/anaconda3/lib/python3.7/site-packages/torch/distributed/rendezvous.py", line 79, in rendezvous
raise RuntimeError("No rendezvous handler for {}://".format(result.scheme))
RuntimeError: No rendezvous handler for ://
Process finished with exit code 1
I encountered the same problem. torch:1.6.1-cu101
/home/ldz/anaconda3/bin/python /home/ldz/temp_project/moco/main_moco.py -a resnet50 --lr 0.03 --batch-size 256 --dist-url 'tcp://10.7.57.163:10001' --multiprocessing-distributed --world-size 1 --rank 0 mini-imagenet Use GPU: 0 for training Traceback (most recent call last): File "/home/ldz/temp_project/moco/main_moco.py", line 402, in <module> main() File "/home/ldz/temp_project/moco/main_moco.py", line 130, in main mp.spawn(main_worker, nprocs=ngpus_per_node, args=(ngpus_per_node, args)) File "/home/ldz/anaconda3/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 200, in spawn return start_processes(fn, args, nprocs, join, daemon, start_method='spawn') File "/home/ldz/anaconda3/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 158, in start_processes while not context.join(): File "/home/ldz/anaconda3/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 119, in join raise Exception(msg) Exception: -- Process 0 terminated with the following error: Traceback (most recent call last): File "/home/ldz/anaconda3/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 20, in _wrap fn(i, *args) File "/home/ldz/temp_project/moco/main_moco.py", line 156, in main_worker world_size=args.world_size, rank=args.rank) File "/home/ldz/anaconda3/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 420, in init_process_group init_method, rank, world_size, timeout=timeout File "/home/ldz/anaconda3/lib/python3.7/site-packages/torch/distributed/rendezvous.py", line 79, in rendezvous raise RuntimeError("No rendezvous handler for {}://".format(result.scheme)) RuntimeError: No rendezvous handler for :// Process finished with exit code 1
I encountered the same problem, too. Have you already saved it?
I encountered the same problem ‘RuntimeError: No rendezvous handler for ://’, too. How did you save it?
How should I set the "dist-url"? Thank you!