Closed yug125lk closed 1 year ago
Hi
I am not sure the problem is GPUS_PER_NODE, since (in the version you get in this repo) we do not use MPI (please check out the original repo from openai, where they use mpi4py).
Rather check whether th.cuda.is_available()
, since the error occurs in this line.
It works. Thank you for your reply.
Hi, thank you again for sharing this code. I use only a single GPU, so I changed the node to 1 (it works with a CPU but not with a GPU). I got this error. GPUS_PER_NODE = 1 SETUP_RETRY_COUNT = 3
python version 3.8 torch 1.9.0+cu111 torchvision 0.10.0+cu111 Windows
File "scripts/segmentation_train.py", line 89, in
main()
File "scripts/segmentation_train.py", line 25, in main
dist_util.setup_dist()
File ".\guided_diffusion\dist_util.py", line 34, in setup_dist
hostname = socket.gethostbyname(socket.getfqdn())
socket.gaierror: [Errno 11001] getaddrinfo failed