RuntimeError: Address already in use

I am trying to run train_vada.py in colab, but got error in title.

$ python train_vada.py

the full error message looks like this:

No Apex Available. Using PyTorch's native Adam. Install Apex for faster training. Experiment dir : /tmp/nvae-diff/expr/exp starting in debug mode Traceback (most recent call last): File "train_vada.py", line 512, in utils.init_processes(0, size, main, args) File "/content/util/utils.py", line 689, in init_processes dist.init_process_group(backend='nccl', init_method='env://', rank=rank, world_size=size) File "/usr/local/lib/python3.7/dist-packages/torch/distributed/distributed_c10d.py", line 500, in init_process_group store, rank, world_size = next(rendezvous_iterator) File "/usr/local/lib/python3.7/dist-packages/torch/distributed/rendezvous.py", line 190, in _env_rendezvous_handler store = TCPStore(master_addr, master_port, world_size, start_daemon, timeout) RuntimeError: Address already in use

I have checkes some common issues and find out the error often comes from a wrong reporting from torch distribution settings, how can I fix it, thanks!

NVlabs / LSGM

RuntimeError: Address already in use #2