Open eyalbetzalel opened 4 years ago
Perhaps a version mismatch between pytorch, cuda and nccl version? What versions are you using ?
Are you running with WSL? WSL does not yet support NCCL: https://github.com/NVIDIA/nccl/issues/442 If you are on WSL, then you can try changing backend in train.py:280 "dist.init_process_group(backend='nccl', init_method='env://', rank=rank, world_size=size)" from "nccl" to "gloo".
Hi,
I am trying to run NVAE on my machine with your command line for CIFAR10 (updating only the .. from 8 to 4 cause I own 4 GPUs):
and get this error:
am I doing something wrong?
Thanks, Eyal
@