JuliaWolleb / Diffusion-based-Segmentation

This is the official Pytorch implementation of the paper "Diffusion Models for Implicit Image Segmentation Ensembles".
MIT License
271 stars 35 forks source link

Training---socket.gaierror: [Errno 11001] getaddrinfo failed-hostname = socket.gethostbyname(socket.getfqdn()) #55

Closed yug125lk closed 8 months ago

yug125lk commented 8 months ago

Hi, thank you again for sharing this code. I use only a single GPU, so I changed the node to 1 (it works with a CPU but not with a GPU). I got this error. GPUS_PER_NODE = 1 SETUP_RETRY_COUNT = 3

python version 3.8 torch 1.9.0+cu111 torchvision 0.10.0+cu111 Windows

File "scripts/segmentation_train.py", line 89, in main() File "scripts/segmentation_train.py", line 25, in main dist_util.setup_dist() File ".\guided_diffusion\dist_util.py", line 34, in setup_dist hostname = socket.gethostbyname(socket.getfqdn()) socket.gaierror: [Errno 11001] getaddrinfo failed

JuliaWolleb commented 8 months ago

Hi I am not sure the problem is GPUS_PER_NODE, since (in the version you get in this repo) we do not use MPI (please check out the original repo from openai, where they use mpi4py). Rather check whether th.cuda.is_available(), since the error occurs in this line.

yug125lk commented 8 months ago

It works. Thank you for your reply.