facebookresearch / habitat-lab

A modular high-level library to train embodied AI agents across a variety of tasks and environments.
https://aihabitat.org/
MIT License
1.84k stars 467 forks source link

BrokenPipeError: [Errno 32] Broken pipe error While evaluating the trained Social Nav agent. #1827

Open vaibhavoutat opened 4 months ago

vaibhavoutat commented 4 months ago

Habitat-Lab and Habitat-Sim versions

Habitat-Lab: v0.3.0 Habitat-Sim: v0.3.0 pip list gave me these in my habitat environment

Habitat is under active development, and we advise users to restrict themselves to stable releases of Habitat-Lab and Habitat-Sim. The bug you are about to report may already be fixed in the latest version.

Master branch contains 'bleeding edge' code, but we do appreciate bug reports for it!

šŸ› Bug

When I run the evaluate script for the Social Nav task in Habitat 3.0 I get the BrokenPipe Error

Steps to Reproduce

Steps to reproduce the behavior:

1.Trained the agent using the train script. python habitat_baselines/habitat_baselines/run.py --config-name=social_nav/social_nav.yaml I stopped the training in between and I have ckpt.45.pth, latest.pth as well as a .habitat-resume-state.pth

2.After that when I run the eval scipt python habitat_baselines/habitat_baselines/run.py --config-name=social_nav/social_nav.yaml habitat_baselines.evaluate=True habitat_baselines.eval_ckpt_path_dir=/checkpoints/latest.pth habitat_baselines.eval.should_load_ckpt=True

I get the following error log.txt

Expected behavior

I expected the agent to evaluate on the val scenes and give me a video of the same.

Additional context

Kindly let me know how to proceed. And if in future if I want to train on few episodes in each scene (so that I can check if this does not happen again), which parameter do I tune in the config file.

vaibhavoutat commented 4 months ago

As there was a error in the eval video part of the code I changed the config in social_nav.yaml file. From habitat_baseline: eval: video_option :['disk'] to [] an empty list but after that I get the following error

0%| | 0/14400 [00:00<?, ?it/s]Error executing job with overrides: ['habitat_baselines.evaluate=True', 'habitat_baselines.eval_ckpt_path_dir=/home/deepbot/habitat/habitatweek/habitat-lab/data/checkpoints/ckpt.50.pth', 'habitat_baselines.eval.should_load_ckpt=True'] Traceback (most recent call last): File "/home/deepbot/habitat/habitatweek/habitat-lab/habitat-baselines/habitat_baselines/run.py", line 31, in main execute_exp(cfg, "eval" if cfg.habitat_baselines.evaluate else "train") File "/home/deepbot/habitat/habitatweek/habitat-lab/habitat-baselines/habitat_baselines/run.py", line 62, in execute_exp trainer.eval() File "/home/deepbot/habitat/habitatweek/habitat-lab/habitat-baselines/habitat_baselines/common/base_trainer.py", line 129, in eval self._eval_checkpoint( File "/home/deepbot/habitat/habitatweek/habitat-lab/habitat-baselines/habitat_baselines/rl/ppo/ppo_trainer.py", line 889, in _eval_checkpoint evaluator.evaluate_agent( File "/home/deepbot/habitat/habitatweek/habitat-lab/habitat-baselines/habitat_baselines/rl/ppo/habitat_evaluator.py", line 139, in evaluate_agent action_data = agent.actor_critic.act( File "/home/deepbot/habitat/habitatweek/habitat-lab/habitat-baselines/habitat_baselines/rl/multi_agent/pop_play_wrappers.py", line 192, in act policy.act( File "/home/deepbot/habitat/habitatweek/habitat-lab/habitat-baselines/habitat_baselines/rl/ppo/policy.py", line 332, in act features, rnn_hiddenstates, = self.net( File "/home/deepbot/miniforge3/envs/habitatweek/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/home/deepbot/miniforge3/envs/habitatweek/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(args, **kwargs) File "/home/deepbot/habitat/habitatweek/habitat-lab/habitat-baselines/habitat_baselines/rl/ddppo/policy/resnet_policy.py", line 761, in forward out = torch.cat(x, dim=1) RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 12 but got size 18 for tensor number 2 in the list.

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace. Exception ignored in: <function VectorEnv.del at 0x7f28e990ba60> Traceback (most recent call last): File "/home/deepbot/habitat/habitatweek/habitat-lab/habitat-lab/habitat/core/vector_env.py", line 615, in del self.close() File "/home/deepbot/habitat/habitatweek/habitat-lab/habitat-lab/habitat/core/vector_env.py", line 470, in close write_fn((CLOSE_COMMAND, None)) File "/home/deepbot/habitat/habitatweek/habitat-lab/habitat-lab/habitat/core/vector_env.py", line 131, in call self.write_fn(data) File "/home/deepbot/habitat/habitatweek/habitat-lab/habitat-lab/habitat/utils/pickle5_multiprocessing.py", line 63, in send self.send_bytes(buf.getvalue()) File "/home/deepbot/miniforge3/envs/habitatweek/lib/python3.9/multiprocessing/connection.py", line 200, in send_bytes self._send_bytes(m[offset:offset + size]) File "/home/deepbot/miniforge3/envs/habitatweek/lib/python3.9/multiprocessing/connection.py", line 411, in _send_bytes self._send(header + buf) File "/home/deepbot/miniforge3/envs/habitatweek/lib/python3.9/multiprocessing/connection.py", line 368, in _send n = write(self._handle, buf) BrokenPipeError: [Errno 32] Broken pipe 0%|