NVIDIA / semantic-segmentation

Nvidia Semantic Segmentation monorepo
BSD 3-Clause "New" or "Revised" License
1.76k stars 388 forks source link

RuntimeError: Address Already in Use #153

Open gunjitsingh1985 opened 2 years ago

gunjitsingh1985 commented 2 years ago

Hi All, I've been experiencing some issues getting a basic pre-trained model to run some inference. I have taken the following steps but to no Avail

  1. I tried to use the readymade scripts 'dump_folder.yml', however ran into the "no such file or directory" error while runx copies my code [ Seems that runx copies the code into a subdirectory "logdir//code" however the "train.py" file is expected at "logdir/".
Screen Shot 2021-07-19 at 3 43 12 PM
  1. The remediation I took for this was to ditch using runx altogether and use the default params specified in the 'dump_folder.yml' file directly from the command line, with the parameters in HPARAMS passed in directly

  2. That generated the following "RuntimeError:Address already in use" error. Which persisted even when I took the "nproc_per_node" parameter down from 8 to 1

Screenshots below -a)

Screen Shot 2021-07-19 at 3 33 51 PM

b)

Screen Shot 2021-07-19 at 3 33 39 PM

c)

Screen Shot 2021-07-19 at 3 33 24 PM

  1. I wonder if there's an OS incompatibility issue. I've got an Imac with MacOS and was planning on running this locally. I've also got python 3 installed. If any of those are known sources of incompatibility. Would rather not want to go through the hassle of dual boot.
ajtao commented 2 years ago

Hello, first off, if you use runx, please make sure that LOGROOT is defined within your .runx file with an absolute path and that path exists :).

I'm not aware of people running this code on mac. The TPCStore address in use error is something that you tend to get if you try to run once, then ctrl-c, then run again and you didn't clean up the old run.

I also have no idea whether this code can run on a mac, so you're on your own there.