facebookresearch / sound-spaces

A first-of-its-kind acoustic simulation platform for audio-visual embodied AI research. It supports training and evaluating multiple tasks and applications.
https://soundspaces.org
Creative Commons Attribution 4.0 International
338 stars 55 forks source link

ConnectionResetError: [Errno 104] Connection reset by peer #45

Open gtatiya opened 3 years ago

gtatiya commented 3 years ago

Hi @ChanganVR,

I am using habitat v0.1.7 and when I run python ss_baselines/av_nav/run.py --exp-config ss_baselines/av_nav/config/audionav/replica/train_telephone/audiogoal_depth.yaml --model-dir data/models/replica/audiogoal_depth, I get this error:

Traceback (most recent call last):
  File "ss_baselines/av_nav/run.py", line 101, in <module>
    main()
  File "ss_baselines/av_nav/run.py", line 95, in main
    trainer.train()
  File "/home/i21_gtatiya/projects/sound-spaces/ss_baselines/av_nav/ppo/ppo_trainer.py", line 316, in train
    episode_steps
  File "/home/i21_gtatiya/projects/sound-spaces/ss_baselines/av_nav/ppo/ppo_trainer.py", line 149, in _collect_rollout_step
    outputs = self.envs.step([a[0].item() for a in actions])
  File "/home/i21_gtatiya/projects/habitat-lab/habitat/core/vector_env.py", line 448, in step
    return self.wait_step()
  File "/home/i21_gtatiya/miniconda3/envs/avn/lib/python3.6/contextlib.py", line 52, in inner
    return func(*args, **kwds)
  File "/home/i21_gtatiya/projects/habitat-lab/habitat/core/vector_env.py", line 436, in wait_step
    self.wait_step_at(index_env) for index_env in range(self.num_envs)
  File "/home/i21_gtatiya/projects/habitat-lab/habitat/core/vector_env.py", line 436, in <listcomp>
    self.wait_step_at(index_env) for index_env in range(self.num_envs)
  File "/home/i21_gtatiya/miniconda3/envs/avn/lib/python3.6/contextlib.py", line 52, in inner
    return func(*args, **kwds)
  File "/home/i21_gtatiya/projects/habitat-lab/habitat/core/vector_env.py", line 409, in wait_step_at
    return self._connection_read_fns[index_env]()
  File "/home/i21_gtatiya/projects/habitat-lab/habitat/core/vector_env.py", line 97, in __call__
    res = self.read_fn()
  File "/home/i21_gtatiya/projects/habitat-lab/habitat/utils/pickle5_multiprocessing.py", line 68, in recv
    buf = self.recv_bytes()
  File "/home/i21_gtatiya/miniconda3/envs/avn/lib/python3.6/multiprocessing/connection.py", line 216, in recv_bytes
    buf = self._recv_bytes(maxlength)
  File "/home/i21_gtatiya/miniconda3/envs/avn/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
    buf = self._recv(4)
  File "/home/i21_gtatiya/miniconda3/envs/avn/lib/python3.6/multiprocessing/connection.py", line 379, in _recv
    chunk = read(handle, remaining)
ConnectionResetError: [Errno 104] Connection reset by peer

I have attached the complete log here: train_telephone_audiogoal_depth_log.txt

I was not getting this error when I was using habitat v0.1.6. Could you please fix this error.

ChanganVR commented 3 years ago

Hi @gtatiya, I do not get this error when running the same command (training the agent for 2M frames). Did you update both habitat-sim and habitat-lab to the v0.1.7 version?

What could help debugging is that adding 'USE_SYNC_VECENV True' to the end of the training command. Let me know this helps.

whcpumpkin commented 2 years ago

I also get this error. Is there too many processes? when the program runs about one hour, the error occurs.

gyx-gloria commented 2 years ago

I also get this error. So how to fix it?

ChanganVR commented 1 year ago

@doudoudou1999 this error most likely occurs due to an error being thrown in the environment process, and thus resulting in a broken connection. To catch that bug, you could add 'USE_SYNC_VECENV True' to the end of the training command to debug.

WHUfreeway commented 1 year ago

I also met this problem recently, using habitat-sim version0.2.2, habitat-lab 0.2.1 and soundspace 0.1.1 how can I fix it?

ChanganVR commented 1 year ago

@WHUfreeway did you try to set USE_SYNC_VECENV True in the command line? It will output some more useful information for debugging.