facebookresearch / habitat-lab

A modular high-level library to train embodied AI agents across a variety of tasks and environments.
https://aihabitat.org/
MIT License
1.93k stars 483 forks source link

Math::Quaternion::fromMatrix(): the matrix is not orthogonal #697

Closed sparisi closed 2 years ago

sparisi commented 3 years ago

Habitat-Lab and Habitat-Sim versions

Habitat-Lab: master Habitat-Sim: master

❓ Questions and Help

I am running the rearrange task with a custom interface in order to use SAC from another repo. I am using the same yaml configuration, just with smaller RGB inputs (64x64). Everything works fine for a while and then I get the following error

W0729 05:21:54.534488 3717050 PhysicsManager.cpp:251] ::addObject : newObjectHandle : 003_cracker_box_:0003
Math::Quaternion::fromMatrix(): the matrix is not orthogonal:
Matrix(nan, nan, nan,
       nan, nan, nan,
       nan, nan, nan)

This happens always after 3188 steps, and always with newObjectHandle : 003_cracker_box_:0003. I am not setting any seed in my script (the policy is randomly initialized, and the actions are different across runs), but I am not sure how Habitat seeding is set. Also, the robot can pick 003_cracker_box_:0003 many times in past iterations, and I am sure I passing correct actions (7-dim actions in [0,1] and last action in [-1,1]).

I am not sure if there is some code error, of if the robot is trying to do some impossible things and thus it crashes. If this is the case, is there a way to prevent the whole code from crashing? Like simply ending the episode, resetting the robot and giving a negative reward to the agent.

dhruvbatra commented 3 years ago

CC: @mathfac

ASzot commented 3 years ago

Hello, I am also experiencing the same issue. Below I included a more detailed trace of the error by running with PYTHONFAULTHANDLER=1. I think this is actually a bug in Habitat Sim, not Habitat Lab. This error happens randomly for me, it will happen on some machines but not others and some methods but not others. For example, I am able to train a SAC policy for 1M steps on the Reacher environment (will be release in this PR https://github.com/facebookresearch/habitat-lab/pull/685) but when I try custom imitation learning algorithms, it fails.

I actually also occasionally encountered this error in the working / non-release version with the motion planning approaches specifically. I had not seen this issue in a while so I assumed it was gone, but now I am seeing it often.

Math::Matrix4::rotation(): the normalized rotation part is not orthogonal:
Matrix(nan, -nan, -nan,
       -nan, nan, nan,
       -nan, nan, -nan)
Fatal Python error: Aborted

Thread 0x00007f25a2e1f700 (most recent call first):
  File "/private/home/andrewszot/miniconda3/envs/l2l/lib/python3.7/threading.py", line 300 in wait
  File "/private/home/andrewszot/miniconda3/envs/l2l/lib/python3.7/threading.py", line 552 in wait
  File "/private/home/andrewszot/miniconda3/envs/l2l/lib/python3.7/site-packages/tqdm/_monitor.py", line 60 in run
  File "/private/home/andrewszot/miniconda3/envs/l2l/lib/python3.7/threading.py", line 926 in _bootstrap_inner
  File "/private/home/andrewszot/miniconda3/envs/l2l/lib/python3.7/threading.py", line 890 in _bootstrap

Thread 0x00007f25a261e700 (most recent call first):

Thread 0x00007f25a1e1d700 (most recent call first):

Current thread 0x00007f270cdfa740 (most recent call first):
  File "/private/home/andrewszot/hablab_fixes/habitat/tasks/rearrange/rearrange_sim.py", line 442 in internal_step
  File "/private/home/andrewszot/hablab_fixes/habitat/tasks/rearrange/rearrange_sim.py", line 373 in step
  File "/private/home/andrewszot/hablab_fixes/habitat/tasks/rearrange/actions.py", line 82 in step
  File "/private/home/andrewszot/hablab_fixes/habitat/core/embodied_task.py", line 303 in step
  File "/private/home/andrewszot/hablab_fixes/habitat/tasks/rearrange/rearrange_task.py", line 65 in step
  File "/private/home/andrewszot/hablab_fixes/habitat/tasks/rearrange/rearrange_reach_task.py", line 16 in step
  File "/private/home/andrewszot/hablab_fixes/habitat/core/env.py", line 267 in step
  File "/private/home/andrewszot/hablab_fixes/habitat/core/env.py", line 410 in step
  File "/private/home/andrewszot/miniconda3/envs/l2l/lib/python3.7/contextlib.py", line 74 in inner
  File "/private/home/andrewszot/hablab_fixes/habitat_baselines/common/environments.py", line 52 in step
  File "/private/home/andrewszot/hablab_fixes/habitat_baselines/utils/gym_adapter.py", line 125 in direct_hab_step
  File "/private/home/andrewszot/hablab_fixes/habitat_baselines/utils/gym_adapter.py", line 122 in step
  File "/private/home/andrewszot/miniconda3/envs/l2l/lib/python3.7/site-packages/gym/core.py", line 234 in step
  File "/private/home/andrewszot/hablab_fixes/habitat_baselines/utils/render_wrapper.py", line 63 in step
  File "./envs.py", line 97 in step
  File "/private/home/andrewszot/p-mbirlo/rl-toolkit/rlf/baselines/monitor.py", line 56 in step
  File "/private/home/andrewszot/p-mbirlo/rl-toolkit/rlf/baselines/vec_env/dummy_vec_env.py", line 52 in step_wait
  File "/private/home/andrewszot/p-mbirlo/rl-toolkit/rlf/rl/envs.py", line 328 in step_wait
  File "/private/home/andrewszot/p-mbirlo/rl-toolkit/rlf/baselines/vec_env/vec_env.py", line 108 in step
  File "/private/home/andrewszot/p-mbirlo/rl-toolkit/rlf/rl/runner.py", line 59 in training_iter
  File "/private/home/andrewszot/p-mbirlo/rl-toolkit/rlf/main.py", line 70 in run_policy
  File "main.py", line 130 in <module>
[1]    620528 abort (core dumped)
mathfac commented 3 years ago

@sparisi can you share the setup to reproduce the issue?

sparisi commented 3 years ago

@mathfac I updated both sim (with conda install) and lab (just git pull) and I now don't see the error anymore (at least not after ~500k steps). For sim I install with conda install habitat-sim withbullet headless -c conda-forge -c aihabitat (I didn't mention this earlier).

I had to change my wrapper to map actions, because now they have different keys ('arm_ac' is now 'arm_action', and the gripper space action is now a Box(-1,1) rather than Discrete(1)).

I also noticed that the code is faster. Before my code was doing ~10 steps per second (including a SAC gradient update), and by the end of the episode it was down at ~3. Now it stays at ~12 all the time.

However, after ~500k steps now I get

0731 06:26:54.212574 1209586 ManagedContainerBase.h:203] ::getObjectHandleByID : Unknown RigidObject managed object ID:29. Aborting      | 213/1000 [00:18<00:59, 13.29it/s]
E0731 06:26:54.212597 1209586 ManagedContainerBase.h:331] <RigidObject>::getObjectCopyByHandle : Unknown RigidObject managed object handle :. Aborting
Traceback (most recent call last):
  File "habitat_rearrange_sac.py", line 188, in <module>
    experiment(alg=alg, n_epochs=50, n_steps=1000, n_episodes_test=5)
  File "habitat_rearrange_sac.py", line 167, in experiment
    core.learn(n_steps=n_steps, n_steps_per_fit=1)
  File "/private/home/sparisi/mushroom-rl/mushroom_rl/core/core.py", line 75, in learn
    self._run(n_steps, n_episodes, fit_condition, render, quiet)
  File "/private/home/sparisi/mushroom-rl/mushroom_rl/core/core.py", line 126, in _run
    episodes_progress_bar, render, initial_states)
  File "/private/home/sparisi/mushroom-rl/mushroom_rl/core/core.py", line 139, in _run_impl
    self.reset(initial_states)
  File "/private/home/sparisi/mushroom-rl/mushroom_rl/core/core.py", line 216, in reset
    self._state = self._preprocess(self.mdp.reset(initial_state).copy())
  File "/private/home/sparisi/mushroom-rl/mushroom_rl/environments/habitat_env.py", line 183, in reset
    obs = self._convert_observation(np.atleast_1d(self.env.reset()))
  File "/private/home/sparisi/mushroom-rl/mushroom_rl/environments/habitat_env.py", line 85, in reset
    return np.asarray(self.env.reset()['robot_head_rgb'])
  File "/private/home/sparisi/habitat-baselines/habitat-lab/habitat_baselines/common/environments.py", line 47, in reset
    observations = super().reset()
  File "/private/home/sparisi/.conda/envs/dm/lib/python3.7/contextlib.py", line 74, in inner
    return func(*args, **kwds)
  File "/private/home/sparisi/habitat-baselines/habitat-lab/habitat/core/env.py", line 365, in reset
    return self._env.reset()
  File "/private/home/sparisi/habitat-baselines/habitat-lab/habitat/core/env.py", line 220, in reset
    self.reconfigure(self._config)
  File "/private/home/sparisi/habitat-baselines/habitat-lab/habitat/core/env.py", line 303, in reconfigure
    self._sim.reconfigure(self._config.SIMULATOR)
  File "/private/home/sparisi/habitat-baselines/habitat-lab/habitat/tasks/rearrange/rearrange_sim.py", line 158, in reconfigure
    self.grasp_mgr.desnap(force=True)
  File "/private/home/sparisi/habitat-baselines/habitat-lab/habitat/tasks/rearrange/rearrange_grasp_manager.py", line 71, in desnap
    obj_bb = get_aabb(self.snap_idx, self._sim)
  File "/private/home/sparisi/habitat-baselines/habitat-lab/habitat/tasks/rearrange/utils.py", line 210, in get_aabb
    obj_node = obj.root_scene_node
AttributeError: 'NoneType' object has no attribute 'root_scene_node'

Not sure if this is relevant, but sometimes I see the following message

BulletPhysicsManager::addArticulatedObjectFromURDF: simpleObjectHandle :  kitchen_counter
BulletPhysicsManager::addArticulatedObjectFromURDF: newArtObjectHandle :  kitchen_counter_:0000
BulletPhysicsManager::addArticulatedObjectFromURDF: simpleObjectHandle :  fridge
BulletPhysicsManager::addArticulatedObjectFromURDF: newArtObjectHandle :  fridge_:0000
BulletPhysicsManager::addArticulatedObjectFromURDF: simpleObjectHandle :  hab_fetch
BulletPhysicsManager::addArticulatedObjectFromURDF: newArtObjectHandle :  hab_fetch_:0000

I don't know what triggers it. It doesn't happen when env.reset() is called.

I will now try to make a MWE that does not require my SAC code, to replicate the error.

sparisi commented 3 years ago

@mathfac

I have attached a script to reproduce the last error (see below). (Maybe this should go to a separate issue.) rearrange_test.zip Launch it from habitat-lab root. Even if I set the action space seed, numpy seed, and env seed, I cannot get it to crash always at the same episode. I am not sure if I am forgetting to set some seed. Nonetheless, 10/15 runs it crashed within 500 episodes.

habitat 0.2.1 (both lab and sim) sim installed withbullet headless conda 4.7.10 python 3.7.9 numpy 1.19.4

Ubuntu 20.04.1 LTS CUDA 11.0

Let me know if you need other details.

W0731 18:13:53.924691 2281048 PhysicsManager.cpp:251] ::addObject : newObjectHandle : 004_sugar_box_:0000
W0731 18:13:53.924845 2281048 PhysicsManager.cpp:248] ::addObject : simpleObjectHandle : 002_master_chef_can
I0731 17:56:07.442541 2292786 ResourceManager.cpp:699] ::loadStageInternal : Attempting to load stage data/replica_cad/configs/stages/../../stages/Stage_v3_sc0_staging.glb
I0731 18:04:24.270742 2293281 ResourceManager.cpp:699] ::loadStageInternal : Attempting to load stage data/replica_cad/configs/stages/../../stages/Stage_v3_sc4_staging.glb
I0731 18:04:24.272399 2293281 ResourceManager.cpp:1280] Importing Basis files as BC7 for Stage_v3_sc4_staging.glb
I0731 18:04:24.448921 2293281 Simulator.cpp:400] ::createSceneInstance : Successfully loaded stage named : data/replica_cad/configs/stages/Stage_v3_sc4_staging.stage_config.json
W0731 18:04:24.448948 2293281 Simulator.cpp:435]
---
Simulator::createSceneInstance : The active scene does not contain semantic annotations.
---
I0731 18:04:24.448958 2293281 MetadataMediator.cpp:262] ::getSceneAttributesByName : Query dataset : default for SceneAttributes named : data/replica_cad/configs/stages/Stage_v3_sc4_staging yields 1 candidates.  Using data/replica_cad/configs/stages/Stage_v3_sc4_staging.
I0731 18:04:24.448971 2293281 SceneDatasetAttributes.cpp:45] ::addNewSceneInstanceToDataset : Dataset : 'default' : Stage Attributes 'data/replica_cad/configs/stages/Stage_v3_sc4_staging.stage_config.json' specified in Scene Attributes exists in dataset library.
I0731 18:04:24.448976 2293281 SceneDatasetAttributes.cpp:85] ::addNewSceneInstanceToDataset : Dataset : 'default' : Lighting Layout Attributes no_lights specified in Scene Attributes exists in dataset library.
I0731 18:04:24.448984 2293281 MetadataMediator.cpp:262] ::getSceneAttributesByName : Query dataset : default for SceneAttributes named : data/replica_cad/configs/stages/Stage_v3_sc4_staging yields 1 candidates.  Using data/replica_cad/configs/stages/Stage_v3_sc4_staging.
I0731 18:04:24.448990 2293281 SceneDatasetAttributes.cpp:45] ::addNewSceneInstanceToDataset : Dataset : 'default' : Stage Attributes 'data/replica_cad/configs/stages/Stage_v3_sc4_staging.stage_config.json' specified in Scene Attributes exists in dataset library.
I0731 18:04:24.448997 2293281 SceneDatasetAttributes.cpp:85] ::addNewSceneInstanceToDataset : Dataset : 'default' : Lighting Layout Attributes no_lights specified in Scene Attributes exists in dataset library.
I0731 18:04:24.449020 2293281 Simulator.cpp:182] Simulator::reconfigure() : createSceneInstance success == true for active scene name : data/replica_cad/configs/stages/Stage_v3_sc4_staging with renderer.
W0731 18:04:24.449369 2293281 simulator.py:224] Could not find navmesh data/replica_cad/configs/stages/Stage_v3_sc4_staging.navmesh, no collision checking will be done
E0731 18:04:24.453068 2293281 ManagedContainerBase.h:203] ::getObjectHandleByID : Unknown RigidObject managed object ID:18. Aborting
E0731 18:04:24.453083 2293281 ManagedContainerBase.h:331] <RigidObject>::getObjectCopyByHandle : Unknown RigidObject managed object handle :. Aborting
Traceback (most recent call last):
  File "test.py", line 75, in <module>
    obs = env.reset()
  File "test.py", line 31, in reset
    return np.asarray(self.env.reset()['robot_head_rgb'])
  File "/private/home/sparisi/habitat-baselines/habitat-lab/habitat_baselines/common/environments.py", line 47, in reset
    observations = super().reset()
  File "/private/home/sparisi/.conda/envs/dm/lib/python3.7/contextlib.py", line 74, in inner
    return func(*args, **kwds)
  File "/private/home/sparisi/habitat-baselines/habitat-lab/habitat/core/env.py", line 365, in reset
    return self._env.reset()
  File "/private/home/sparisi/habitat-baselines/habitat-lab/habitat/core/env.py", line 220, in reset
    self.reconfigure(self._config)
  File "/private/home/sparisi/habitat-baselines/habitat-lab/habitat/core/env.py", line 303, in reconfigure
    self._sim.reconfigure(self._config.SIMULATOR)
  File "/private/home/sparisi/habitat-baselines/habitat-lab/habitat/tasks/rearrange/rearrange_sim.py", line 158, in reconfigure
    self.grasp_mgr.desnap(force=True)
  File "/private/home/sparisi/habitat-baselines/habitat-lab/habitat/tasks/rearrange/rearrange_grasp_manager.py", line 71, in desnap
    obj_bb = get_aabb(self.snap_idx, self._sim)
  File "/private/home/sparisi/habitat-baselines/habitat-lab/habitat/tasks/rearrange/utils.py", line 210, in get_aabb
    obj_node = obj.root_scene_node
AttributeError: 'NoneType' object has no attribute 'root_scene_node'
ASzot commented 3 years ago

I still get the not orthogonal error, even on the most recent commit of Habitat Sim and Habitat Lab.

jadkins99 commented 3 years ago

I also am getting the not orthogonal error. Any updates on this issue?

lukasmajer commented 2 years ago

I am still encountering this problem.

ASzot commented 2 years ago

This issue should now be resolved in main.

Also, the hab_suite branch now contains most of the functionality from the paper and we are slowly merging this into main. We made this tutorial to help get started with Habitat 2.0.

Please let me know if you have any more questions!

ethanabrooks commented 2 years ago

@ASzot I am able to reproduce this on v0.2.1 in Docker.

You can see all the code at https://github.com/ethanabrooks/habitat-sim-issue. I've tried to make things as minimal as possible but unfortunately there are limits.

This assumes that you have downloaded data to ~/.cache/data. These are the download instructions:

wget -P ~/.cache/data https://dl.fbaipublicfiles.com/habitat/data/datasets/objectnav/m3d/v1/objectnav_mp3d_v1.zip
unzip ~/.cache/data/objectnav_mp3d_v1.zip -d ~/.cache/data/
python2 download_mp.py --id HxpKQynjfin --task habitat -o ~/.cache/data/
unzip ~/.cache/data/v1/tasks/mp3d_habitat.zip -d ~/.cache/data/tasks/

This of course requires download_mp.py from our friends at Matterport. Once this is done, run

git clone git@github.com:ethanabrooks/habitat-sim-issue.git
cd habitat-sim-issue
docker build -t issue .
docker run --rm -it \
  --gpus all \
  -v "$HOME/.cache/data/:/root/.cache/data" \
  issue main.py

The error typically arises for me within the first 100 episodes (a second or two).

I also tried this with the latest habitat-lab and habitat-sim commits, but import habitat threw an error.

ethanabrooks commented 2 years ago

I am able to eliminate this error by eliminating the "LOOK_DOWN" and "LOOK_UP" actions.

aclegg3 commented 2 years ago

This is no longer an issue on main branch or hab_suite, so closing this issue thread. Thanks @ethanabrooks for a possible work-around on 0.2.1.

ethanabrooks commented 2 years ago

I had some issues with the main branch but I found that these two commits work: Habitat-lab: fbdd9fd42e7716c04bbde1da0d867e6d79dd3490 Habitat-sim: 066e4343c27a03ccd969ac6a83cbd262d5c7f2f9

ethanabrooks commented 2 years ago

@aclegg3 I am actually able to now reproduce this issue on the main branch when I use the conda version of habitat-sim.

Here is a minimal repository that reproduces the issue: https://github.com/ethanabrooks/habitat-sim-issue/tree/conda

I am using habitat-sim==0.2.1 from conda. For habitat-lab I had to fork the repository and comment out three lines of code that were causing errors (this was the ModuleNotFoundError: No module named 'habitat.tasks.rearrange.sub_tasks' error that we discussed on Slack), but I don't think that's the reason that this orthogonal matrix error has resurfaced.

erikwijmans commented 2 years ago

Thank you for being persistent on this. Looks like this is down to a change in how we are handling semantic sensors that is only exercised in certain cases. I made a PR on habitat sim to fix this: https://github.com/facebookresearch/habitat-sim/pull/1720

erikwijmans commented 2 years ago

@ethanabrooks Just to clarify, the repro you gave doesn't encounter the exact issue here right? The issue here has nan's in the matrix while the repro you gave has non-nan values in the matrix, but it isn't orthogonal?

ethanabrooks commented 2 years ago

That's correct. Here's what I get:

Math::Quaternion::fromMatrix(): the matrix is not orthogonal:
Matrix(-0.552719, 0.416679, -0.721715,
       0, 0.866029, 0.500006,
       0.833368, 0.276356, -0.478667)
erikwijmans commented 2 years ago

Sounds good. The PR above should fix it then. It just won't fix nans (that's caused by a different thing than floating point drift) so I wanted to double check.

ethanabrooks commented 2 years ago

Thanks for your help with this! I will let you know if the issue is fixed on my end once the PR goes through.

ethanabrooks commented 2 years ago

This PR fixes the issue in my repository (https://github.com/ethanabrooks/habitat-sim-issue).