facebookresearch / OccupancyAnticipation

This repository contains code for our publication "Occupancy Anticipation for Efficient Exploration and Navigation" in ECCV 2020.
MIT License
78 stars 26 forks source link

RuntimeError when running run.py for training #34

Closed AgentEXPL closed 3 years ago

AgentEXPL commented 3 years ago

When I tried to train a model for ans_depth via the command "python -u run.py --exp-config configs/model_configs/ans_depth/ppo_exploration.yaml --run-type train", the following error occurs ("RuntimeError: inverse_cuda: For batch 0: U(360710144,360710144) is zero, singular U."). I have no idea on how the error is generated. Is it caused by incorrect installed environments, or other things? I followed the installation guidance, and checked that habitat_sim and habitat did work well. Hope someone to help me figure out the reason or provide solution.

2021-08-16 11:24:35,760 local_agent number of parameters: 489697 2021-08-16 11:24:35,761 global_agent number of parameters: 60739 =================== Mapper rollouts ====================== key: action_at_t_1 , memory: 0.0000 GB key: depth_at_t , memory: 0.0655 GB key: depth_at_t_1 , memory: 0.0655 GB key: ego_map_gt_anticipated_at_t , memory: 0.0338 GB key: ego_map_gt_at_t , memory: 0.0338 GB key: ego_map_gt_at_t_1 , memory: 0.0338 GB key: pose_at_t , memory: 0.0000 GB key: pose_at_t_1 , memory: 0.0000 GB key: pose_gt_at_t , memory: 0.0000 GB key: pose_gt_at_t_1 , memory: 0.0000 GB key: pose_hat_at_t_1 , memory: 0.0000 GB key: rgb_at_t , memory: 0.1966 GB key: rgb_at_t_1 , memory: 0.1966 GB Total memory: 0.6258 GB ================== Local policy rollouts ===================== key: goal_at_t , memory: 0.0000 GB key: rgb_at_t , memory: 0.0613 GB key: t , memory: 0.0000 GB Total memory: 0.0613 GB ================== Global policy rollouts ==================== key: map_at_t , memory: 3.7236 GB key: pose_in_map_at_t , memory: 0.0000 GB Total memory: 3.7236 GB Traceback (most recent call last): File "run.py", line 70, in main() File "run.py", line 39, in main run_exp(**vars(args)) File "run.py", line 64, in run_exp trainer.train() File "OccupancyAnticipation/occant_baselines/rl/occant_exp_trainer.py", line 1079, in train batch["pose_gt"], File "OccupancyAnticipation/occant_baselines/rl/policy.py", line 481, in ext_register_map return self._register_map(m, p, x) File "OccupancyAnticipation/occant_baselines/rl/policy.py", line 423, in _register_map p_reg = self._spatial_transform(p_pad, x) File "OccupancyAnticipation/occant_baselines/rl/policy.py", line 394, in _spatial_transform p_trans = spatial_transform_map(p, dx_map, invert=invert) File "OccupancyAnticipation/occant_utils/common.py", line 60, in spatial_transform_map Ainv = torch.inverse(A) RuntimeError: inverse_cuda: For batch 0: U(360710144,360710144) is zero, singular U. Exception ignored in: <function VectorEnv.del at 0x7f0836dde680> Traceback (most recent call last): File "OccupancyAnticipation/environments/habitat/habitat-api/habitat/core/vector_env.py", line 534, in del self.close() File "OccupancyAnticipation/environments/habitat/habitat-api/habitat/core/vector_env.py", line 416, in close write_fn((CLOSE_COMMAND, None)) File "/home/agent/anaconda3/envs/ocan/lib/python3.7/multiprocessing/connection.py", line 206, in send self._send_bytes(_ForkingPickler.dumps(obj)) File "/home/agent/anaconda3/envs/ocan/lib/python3.7/multiprocessing/connection.py", line 404, in _send_bytes self._send(header + buf) File "/home/agent/anaconda3/envs/ocan/lib/python3.7/multiprocessing/connection.py", line 368, in _send n = write(self._handle, buf) BrokenPipeError: [Errno 32] Broken pipe

AgentEXPL commented 3 years ago

Sorry for asking, I've found the reason. It is caused by unmatched versions between cuda and pytorch.