facebookresearch / habitat-lab

A modular high-level library to train embodied AI agents across a variety of tasks and environments.
https://aihabitat.org/
MIT License
1.83k stars 466 forks source link

EOF Error after force stop training #1005

Open HuskyKingdom opened 1 year ago

HuskyKingdom commented 1 year ago

I was running the provided baseline code using the following command: python3 -u habitat_baselines/run.py --exp-config habitat_baselines/config/pointnav/ppo_pointnav_example.yaml --run-type train, training was good at first for few hours until I stoped it by ctrl+c in terminal, when I try to rerun it, I got the EOF error when constructing env, shown as following:


Process ForkServerProcess-1: Traceback (most recent call last): File "/Users/topsofter/opt/miniconda3/envs/habitat/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap self.run() File "/Users/topsofter/opt/miniconda3/envs/habitat/lib/python3.7/multiprocessing/process.py", line 99, in run self._target(*self._args, **self._kwargs) File "/Users/topsofter/opt/miniconda3/envs/habitat/lib/python3.7/contextlib.py", line 74, in inner return func(*args, **kwds) File "/Users/topsofter/Desktop/PhD/Embodied_AI/habitat-lab/habitat/core/vector_env.py", line 233, in _worker_env env = env_fn(*env_fn_args) File "/Users/topsofter/Desktop/PhD/Embodied_AI/habitat-lab/habitat/utils/env_utils.py", line 32, in make_env_fn env = env_class(config=config, dataset=dataset) TypeError: 'NoneType' object is not callable Traceback (most recent call last): File "habitat_baselines/run.py", line 81, in <module> main() File "habitat_baselines/run.py", line 40, in main run_exp(**vars(args)) File "habitat_baselines/run.py", line 77, in run_exp execute_exp(config, run_type) File "habitat_baselines/run.py", line 60, in execute_exp trainer.train() File "/Users/topsofter/opt/miniconda3/envs/habitat/lib/python3.7/contextlib.py", line 74, in inner return func(*args, **kwds) File "/Users/topsofter/Desktop/PhD/Embodied_AI/habitat-lab/habitat_baselines/rl/ppo/ppo_trainer.py", line 729, in train self._init_train() File "/Users/topsofter/Desktop/PhD/Embodied_AI/habitat-lab/habitat_baselines/rl/ppo/ppo_trainer.py", line 259, in _init_train self._init_envs() File "/Users/topsofter/Desktop/PhD/Embodied_AI/habitat-lab/habitat_baselines/rl/ppo/ppo_trainer.py", line 206, in _init_envs workers_ignore_signals=is_slurm_batch_job(), File "/Users/topsofter/Desktop/PhD/Embodied_AI/habitat-lab/habitat/utils/env_utils.py", line 116, in construct_envs workers_ignore_signals=workers_ignore_signals, File "/Users/topsofter/Desktop/PhD/Embodied_AI/habitat-lab/habitat/core/vector_env.py", line 194, in __init__ read_fn() for read_fn in self._connection_read_fns File "/Users/topsofter/Desktop/PhD/Embodied_AI/habitat-lab/habitat/core/vector_env.py", line 194, in <listcomp> read_fn() for read_fn in self._connection_read_fns File "/Users/topsofter/Desktop/PhD/Embodied_AI/habitat-lab/habitat/core/vector_env.py", line 97, in __call__ res = self.read_fn() File "/Users/topsofter/Desktop/PhD/Embodied_AI/habitat-lab/habitat/utils/pickle5_multiprocessing.py", line 67, in recv buf = self.recv_bytes() File "/Users/topsofter/opt/miniconda3/envs/habitat/lib/python3.7/multiprocessing/connection.py", line 216, in recv_bytes buf = self._recv_bytes(maxlength) File "/Users/topsofter/opt/miniconda3/envs/habitat/lib/python3.7/multiprocessing/connection.py", line 407, in _recv_bytes buf = self._recv(4) File "/Users/topsofter/opt/miniconda3/envs/habitat/lib/python3.7/multiprocessing/connection.py", line 383, in _recv raise EOFError EOFError Exception ignored in: <function VectorEnv.__del__ at 0x7fe898c9f320> Traceback (most recent call last): File "/Users/topsofter/Desktop/PhD/Embodied_AI/habitat-lab/habitat/core/vector_env.py", line 592, in __del__ self.close() File "/Users/topsofter/Desktop/PhD/Embodied_AI/habitat-lab/habitat/core/vector_env.py", line 460, in close read_fn() File "/Users/topsofter/Desktop/PhD/Embodied_AI/habitat-lab/habitat/core/vector_env.py", line 97, in __call__ res = self.read_fn() File "/Users/topsofter/Desktop/PhD/Embodied_AI/habitat-lab/habitat/utils/pickle5_multiprocessing.py", line 67, in recv buf = self.recv_bytes() File "/Users/topsofter/opt/miniconda3/envs/habitat/lib/python3.7/multiprocessing/connection.py", line 216, in recv_bytes buf = self._recv_bytes(maxlength) File "/Users/topsofter/opt/miniconda3/envs/habitat/lib/python3.7/multiprocessing/connection.py", line 407, in _recv_bytes buf = self._recv(4) File "/Users/topsofter/opt/miniconda3/envs/habitat/lib/python3.7/multiprocessing/connection.py", line 383, in _recv raise EOFError EOFError:


Please may I have some help on how to fix it, thx~

SepMJ commented 1 year ago

I have encountered the same error. Have you resolved it?Thank you.

YinpeiDai commented 1 year ago

I also encountered this when using multi-node-slurm.sh for DDPPO. How to solve this?

jinxin-zhu commented 10 months ago

I have encountered the same error. How to solve this? Thank you.

Moon-heart commented 7 months ago

我遇到了同样的错误。如何解决这个问题?

wu-jintao commented 4 months ago

I have encountered the same error. How to solve this?

aclegg3 commented 4 months ago

Hey all, it would be helpful to know a bit more. Can you try re-running with the debug environment flag export HABITAT_ENV_DEBUG=1 this should make errors that happen in the multiprocessing environment explicit.