facebookresearch / habitat-lab

A modular high-level library to train embodied AI agents across a variety of tasks and environments.
https://aihabitat.org/
MIT License
1.93k stars 483 forks source link

Objectnav baseline error #923

Closed pioneer-innovation closed 2 years ago

pioneer-innovation commented 2 years ago

Hi, I use the command below to train baseline in mp3d.

python -u habitat_baselines/run.py --exp-config habitat_baselines/config/objectnav/ddppo_objectnav.yaml --run-type train

It returns error:

Traceback (most recent call last):
  File "habitat_baselines/run.py", line 81, in <module>
    main()
  File "habitat_baselines/run.py", line 40, in main
    run_exp(**vars(args))
  File "habitat_baselines/run.py", line 77, in run_exp
    execute_exp(config, run_type)
  File "habitat_baselines/run.py", line 60, in execute_exp
    trainer.train()
  File "/data/zqf/anaconda3/envs/habitat/lib/python3.7/contextlib.py", line 74, in inner
    return func(*args, **kwds)
  File "/data/zqf/experiment/habitat-lab/habitat_baselines/rl/ppo/ppo_trainer.py", line 817, in train
    self._compute_actions_and_step_envs(buffer_index)
  File "/data/zqf/experiment/habitat-lab/habitat_baselines/rl/ppo/ppo_trainer.py", line 458, in _compute_actions_and_step_envs
    step_batch["masks"],
  File "/data/zqf/experiment/habitat-lab/habitat_baselines/rl/ppo/policy.py", line 114, in act
    observations, rnn_hidden_states, prev_actions, masks
  File "/data/zqf/anaconda3/envs/habitat/lib/python3.7/site-packages/torch-1.12.0-py3.7-linux-x86_64.egg/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/data/zqf/experiment/habitat-lab/habitat_baselines/rl/ddppo/policy/resnet_policy.py", line 528, in forward
    out, rnn_hidden_states, masks
  File "/data/zqf/anaconda3/envs/habitat/lib/python3.7/site-packages/torch-1.12.0-py3.7-linux-x86_64.egg/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/data/zqf/experiment/habitat-lab/habitat_baselines/rl/models/rnn_state_encoder.py", line 342, in forward
    x, hidden_states = self.single_forward(x, hidden_states, masks)
  File "/data/zqf/experiment/habitat-lab/habitat_baselines/rl/models/rnn_state_encoder.py", line 293, in single_forward
    x.unsqueeze(0), self.unpack_hidden(hidden_states)
  File "/data/zqf/anaconda3/envs/habitat/lib/python3.7/site-packages/torch-1.12.0-py3.7-linux-x86_64.egg/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/data/zqf/anaconda3/envs/habitat/lib/python3.7/site-packages/torch-1.12.0-py3.7-linux-x86_64.egg/torch/nn/modules/rnn.py", line 770, in forward
    self.dropout, self.training, self.bidirectional, self.batch_first)
RuntimeError: rnn: hx is not contiguous
Exception ignored in: <function VectorEnv.__del__ at 0x7fee1d89bb00>
Traceback (most recent call last):
  File "/data/zqf/experiment/habitat-lab/habitat/core/vector_env.py", line 592, in __del__
    self.close()
  File "/data/zqf/experiment/habitat-lab/habitat/core/vector_env.py", line 463, in close
    write_fn((CLOSE_COMMAND, None))
  File "/data/zqf/experiment/habitat-lab/habitat/core/vector_env.py", line 118, in __call__
    self.write_fn(data)
  File "/data/zqf/experiment/habitat-lab/habitat/utils/pickle5_multiprocessing.py", line 62, in send
    self.send_bytes(buf.getvalue())
  File "/data/zqf/anaconda3/envs/habitat/lib/python3.7/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/data/zqf/anaconda3/envs/habitat/lib/python3.7/multiprocessing/connection.py", line 404, in _send_bytes
    self._send(header + buf)
  File "/data/zqf/anaconda3/envs/habitat/lib/python3.7/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe

I have changed NCCL to GLOO in the config file, but it still return the same error message.

dhruvbatra commented 2 years ago

Have you tried following the instructions here: https://github.com/facebookresearch/habitat-challenge

srama2512 commented 2 years ago

@pioneer-innovation I was able to fix a similar error by including PR https://github.com/facebookresearch/habitat-lab/pull/901 in my code. I was using habitat-lab v0.2.2 when I encountered this error.

pioneer-innovation commented 2 years ago

@srama2512 Thank you very much! It works now! :)