StanfordVL / iGibson

A Simulation Environment to train Robots in Large Realistic Interactive Scenes
http://svl.stanford.edu/igibson
MIT License
642 stars 157 forks source link

Impossible to replay BEHAVIOR VR demo deterministically #215

Open Roadsong opened 2 years ago

Roadsong commented 2 years ago

There are basically two methods I tried to replay (as of June 14).

Method 1

BEHAVIOR: master branch. iGibson: master branch. BDDL: master branch. Dataset: version 2.0.6.

Command: python -m igibson.examples.learning.demo_replaying_examples Results: I can see the demo, but the the motion of the robot is different from ground truth environment. I believe I faced a similar (or even worse) issue as https://github.com/StanfordVL/iGibson/issues/161#issuecomment-1022341060. I know it may because I am not running demos on a Windows machine, but it also shows the below messages and I don't know what's going on. I simply cannot find the vr-demo-collection branch at all. Also, the version 2.0.6 is suspicious, I don't know which version is correct for reproducing the demo.

********************************************************************************
WARNING:igibson.render.mesh_renderer.mesh_renderer_settings:WARN: Darwin does not support optimized renderer, automatically disabling
Warning, difference in git commits for repo: iGibson. This may impact deterministic replay
Logged git info:

{   'branch_name': 'vr-demo-collection',
    'code_diff': 'diff --git a/igibson/objects/multi_object_wrappers.py '
                 'b/igibson/objects/multi_object_wrappers.py\n'
                 'index 4827dad5..a3aa6610 100644\n'
                 '--- a/igibson/objects/multi_object_wrappers.py\n'
                 '+++ b/igibson/objects/multi_object_wrappers.py\n'
                 '@@ -129,7 +129,7 @@ class ObjectGrouper(BaseObject):\n'
                 ' \n'
                 '         # These attributes are used during object import '
                 'and should return\n'
                 '         # the concatenation results of all objects in '
                 'self.objects\n'
                 '-        if item in ["visual_mesh_to_material", '
                 '"link_name_to_vm", "body_ids", "is_fixed"]:\n'
                 '+        if item in ["visual_mesh_to_material", '
                 '"link_name_to_vm", "body_ids", "is_fixed", '
                 '"renderer_instances"]:\n'
                 '             return '
                 'list(itertools.chain.from_iterable(attrs))\n'
                 ' \n'
                 "         # Otherwise, check that it's the same for everyone "
                 'and then just return the value.\n'
                 '@@ -188,7 +188,6 @@ class ObjectGrouper(BaseObject):\n'
                 '             if issubclass(state_type, '
                 'AbsoluteObjectState):\n'
                 '                 '
                 'state_instance.load(dump[get_state_name(state_type)])\n'
                 ' \n'
                 '-\n'
                 ' class ObjectMultiplexer(BaseObject):\n'
                 '     """A multi-object wrapper that acts as a proxy for the '
                 'selected one between the set of objects it contains."""\n'
                 ' \n'
                 'diff --git '
                 'a/igibson/render/mesh_renderer/mesh_renderer_vr.py '
                 'b/igibson/render/mesh_renderer/mesh_renderer_vr.py\n'
                 'index a768ad31..f766f0ac 100644\n'
                 '--- a/igibson/render/mesh_renderer/mesh_renderer_vr.py\n'
                 '+++ b/igibson/render/mesh_renderer/mesh_renderer_vr.py\n'
                 '@@ -138,7 +138,7 @@ class VrSettings(object):\n'
                 '         self.use_tracked_body = '
                 'shared_settings["use_tracked_body"]\n'
                 '         self.torso_tracker_serial = '
                 'shared_settings["torso_tracker_serial"]\n'
                 '         # Both body-related values need to be set in order '
                 'to use the torso-tracked body\n'
                 '-        self.using_tracked_body = self.use_tracked_body and '
                 'self.torso_tracker_serial\n'
                 '+        self.using_tracked_body = self.use_tracked_body and '
                 'bool(self.torso_tracker_serial)\n'
                 '         if self.torso_tracker_serial == "":\n'
                 '             self.torso_tracker_serial = None\n'
                 ' \n'
                 'diff --git a/igibson/robots/behavior_robot.py '
                 'b/igibson/robots/behavior_robot.py\n'
                 'index bc984df9..be99fa9f 100644\n'
                 '--- a/igibson/robots/behavior_robot.py\n'
                 '+++ b/igibson/robots/behavior_robot.py\n'
                 '@@ -195,6 +195,8 @@ class BehaviorRobot(ManipulationRobot, '
                 'LocomotionRobot, ActiveCameraRobot):\n'
                 ' \n'
                 '         # TODO: Remove hacky fix - constructor/config '
                 'should contain this data.\n'
                 '         if self.simulator.mode == SimulatorMode.VR:\n'
                 '+            print("robot:", self.use_tracked_body)\n'
                 '+            print("sim:", '
                 'self.simulator.vr_settings.using_tracked_body)\n'
                 '             assert (\n'
                 '                 self.use_tracked_body == '
                 'self.simulator.vr_settings.using_tracked_body\n'
                 '             ), "Robot and VR config do not match in terms '
                 'of whether to use tracked body. Please update either '
                 'config."\n'
                 'diff --git a/igibson/vr_config.yaml '
                 'b/igibson/vr_config.yaml\n'
                 'index b6051117..13b7007b 100644\n'
                 '--- a/igibson/vr_config.yaml\n'
                 '+++ b/igibson/vr_config.yaml\n'
                 '@@ -36,7 +36,7 @@ shared_settings:\n'
                 '   # Serial number of VR torso tracker - this can be found '
                 'by connecting/pairing the tracker,\n'
                 '   # then going into Steam VR settings -> controllers -> '
                 'manage vive trackers\n'
                 '   # Note: replace this with your own tracker serial number '
                 'or leave blank to not use one\n'
                 '-  torso_tracker_serial: "LHR-DF82C682"\n'
                 '+  torso_tracker_serial: "LHR-BDE12AB6"\n'
                 ' # Settings that are specific to different VR devices (eg. '
                 'eye tracking, button mapping)\n'
                 ' device_settings:\n'
                 '   HTC_VIVE_PRO_EYE:',
    'code_diff_staged': '',
    'commit_hash': 'bc2520de66025c486cff11e30d881b0f29cd1384'}
Current git info:

{   'branch_name': 'master',
    'code_diff': 'diff --git a/igibson/tasks/behavior_task.py '
                 'b/igibson/tasks/behavior_task.py\n'
                 'index 3c1b9868..d38b763f 100644\n'
                 '--- a/igibson/tasks/behavior_task.py\n'
                 '+++ b/igibson/tasks/behavior_task.py\n'
                 '@@ -127,6 +127,12 @@ class BehaviorTask(BaseTask):\n'
                 '             self.conds, self.backend, self.object_scope, '
                 'self.goal_conditions\n'
                 '         )\n'
                 ' \n'
                 "+        # print('[DEBUG]][self.obj_scope]', "
                 'self.object_scope)\n'
                 '+        # print(self.initial_conditions)\n'
                 '+        # print(self.goal_conditions)\n'
                 "+        # print('[DEBUG][self.ground_goal_state_options]', "
                 'self.ground_goal_state_options)\n'
                 '+        # exit()\n'
                 '+\n'
                 '         # Demo attributes\n'
                 '         self.instruction_order = '
                 'np.arange(len(self.conds.parsed_goal_conditions))\n'
                 '         np.random.shuffle(self.instruction_order)',
    'code_diff_staged': '',
    'commit_hash': '58ac14cf62949008b6851a5a95602cd5084edffd'}
Creating environment and resetting it

Method 2

BEHAVIOR: master branch. iGibson: behavior-replay branch. BDDL: master branch / behavior-refactored-verified-problems. Dataset: version 2.0.6.

Command: python -m igibson.examples.behavior.behavior_demo_replay Results: Error, no module named 'bddl.activity_base'. / Error, no module named 'igibson.task'

Summary

In short, impossible to replay at all. Can anyone write a detailed instructions on that? If you want the benchmark get attention, at least people can reproduce the basic things easily, right? Many instructions are not up-to-date and show many inconsistencies. Can anyone tells me which one is the correct one to reproduce the vr demo? This https://github.com/StanfordVL/behavior/blob/main/docs/vr_demos.md does not work.

cgokmen commented 2 years ago

Can you try switching all three repos (BEHAVIOR, iGibson, BDDL) to the behavior_replay branch and try again?

cgokmen commented 2 years ago

My bad - you don't need the BEHAVIOR repo for this. Just the iGibson repo (behavior-replay branch) and the BDDL repo (also behavior-replay). The replay script will be in examples/behavior/behavior_demo_replay I think.

cgokmen commented 2 years ago

You also need the behavior-replay copy of ig_dataset (https://storage.googleapis.com/gibson_scenes/ig_dataset_replay.zip) and of the robot assets (https://github.com/StanfordVL/ig_assets/tree/behavior-replay).

So essentially you need all four components to be behavior_replay copies:

Roadsong commented 2 years ago

Hi @cgokmen, thanks for you quick reply. I basically followed your advice and installed suggested version/branches, etc.

Exp 1

BEHAVIOR: null iGibson: behavior-replay branch. BDDL: behavior_refactor_verified_problems branch. Dataset version: v2.0.1, downloaded. Assets: downloaded.

Installed in a conda environment, created by conda create -n replay python=3.7 Install iGibson by pip install -e ./iGibson Install BDDL by python setup.py install (behavior_refactor_verified_problems branch)

Command: python -m igibson.examples.behavior.behavior_demo_replay Results:

Traceback (most recent call last):
  File "/Users/xxx/opt/anaconda3/envs/replay/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/Users/xxx/opt/anaconda3/envs/replay/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/Users/xxx/Desktop/research/replay/iGibson/igibson/examples/behavior/behavior_demo_replay.py", line 280, in <module>
    main()
  File "/Users/xxx/Desktop/research/replay/iGibson/igibson/examples/behavior/behavior_demo_replay.py", line 268, in main
    bddl.set_backend("iGibson")
  File "/Users/xxx/Desktop/research/replay/bddl/bddl/__init__.py", line 10, in set_backend
    from igibson.task.bddl_backend import IGibsonBDDLBackend
ModuleNotFoundError: No module named 'igibson.task'

It seems that the iGibson's behavior-replay branch is not compatible with something else. I do find igibson.task code in the main branch and I am not sure if the problem can be solve if I manually include those code into the current behavior-replay branch.

Something else: Also, I simply find that in BDDL repo behavior_refactor_verified_problems branch, the pytest cannot pass. As of https://github.com/StanfordVL/bddl/issues/10#issuecomment-1022554441 mentioned, a correct branch/commit of BDDL is really hard to find.

Exp2

If I somehow changed the BDDL branch to master branch, the behavior-replay branch of iGibson code does not work because

Traceback (most recent call last):
  File "/Users/xxx/opt/anaconda3/envs/replay/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/Users/xxx/opt/anaconda3/envs/replay/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/Users/xxx/Desktop/research/replay/iGibson/igibson/examples/behavior/behavior_demo_replay.py", line 15, in <module>
    from igibson.activity.activity_base import iGBEHAVIORActivityInstance
  File "/Users/xxx/Desktop/research/replay/iGibson/igibson/activity/activity_base.py", line 6, in <module>
    from bddl.activity_base import BEHAVIORActivityInstance
ModuleNotFoundError: No module named 'bddl.activity_base'

It seems that the master branch code of iGibson and BDDL works better (at least I can see the windows).

cgokmen commented 2 years ago

I have pushed a behavior-replay branch onto BDDL, can you try using that? Thanks.

Roadsong commented 2 years ago

Hi @cgokmen, thanks for your time and effort.

I basically somehow managed to reproduce the demo on a mac machine using the behavior-replay branch of iGibson and behavior_verfied_problems branch of BDDL. Surely I include the missing code and files, and I also did some necessary modifications to make it work. I am glad to see a demo running on a window, e.g., cleaning_windows, even the results are not successful. However, it seems that for deterministic replay (exact match of every single steps), I must get everything work on a windows machine, since the pybullet-svl library may yield different results across platforms (as mentioned in https://github.com/StanfordVL/bddl/issues/10#issuecomment-1022562103)

So, I tried to reproduce this in a virtual win 10 machine to see if the full deterministic replay is possible. That was not easy because I was using a virtual win 10 machine (to be specific, the host os is MacOS, and the guest OS is Win10 running on Parallel Desktop, the graphic card is AMD series). After fixing some issues and installing a good vs compiler, I could successfully build and install the iGibson, and BDDL seemed to work (I tried behavior-replay branch this time).

However, I got the following problem this time:

ERROR: Failed to create GLFW window.

Well, I found this error was trigger by https://github.com/StanfordVL/iGibson/blob/58ac14cf62949008b6851a5a95602cd5084edffd/igibson/render/cpp/glfw_mesh_renderer.cpp#L91 because glfwCreateWindow was not successful.

I googled a lot about this problem, especially paid attention to some platform-specific questions and answers. The results were mixed and I suspect that the glfw cannot (easily) work on a win 10 guest machine running on parallel desktop. It looks like the OpenGL support from parallel desktop is not good enough, yet I was not sure about this.

My question is, can I at least do some test on a virtual windows machine without Nvidia graphic cards, CUDA, etc? It seems that the MacOS without Nvidia and CUDA seems fine except the determinism problem. Or, I must have a physical win 10 machine and satisfy the requirements https://stanfordvl.github.io/iGibson/installation.html#system-requirements?

Edit: I test some functions in script https://github.com/StanfordVL/iGibson/blob/master/tests/test_render.py and it gave me the same GLFW errors.

cgokmen commented 2 years ago

As far as I know you do need a Windows machine with a GPU, but let me defer to @fxia22 who will know these requirements better. Perhaps you could get one on Azure or something? Even a somewhat older GPU like a 1080 will work fine for this purpose. Sorry I can't be more helpful.

Roadsong commented 2 years ago

Hi @cgokmen, thanks for your time and effort again, you really helped a lot! You don't need to answer the following questions, I just want to update the issue in case anyone else has similar problems.

Basically I used a physical windows machine (without GPU, but I don't think GPU is the cause of following problems). I debug and read the source code carefully and found that I still couldn't reproduce an activity deterministically.

Experiment

Logs, cannot replay deterministically

WARNING:root:Trying to set hidden state before vertices are merged, converted to no-op
----- Writing log data to hd5 on frame: 200 -----
WARNING:root:Trying to set hidden state before vertices are merged, converted to no-op
WARNING:root:Trying to set hidden state before vertices are merged, converted to no-op
----- Writing log data to hd5 on frame: 400 -----
----- Writing log data to hd5 on frame: 600 -----
----- Writing log data to hd5 on frame: 800 -----
----- Writing log data to hd5 on frame: 1000 -----
----- Writing log data to hd5 on frame: 1200 -----
----- Writing log data to hd5 on frame: 1400 -----
----- Writing log data to hd5 on frame: 1600 -----
----- Writing log data to hd5 on frame: 1800 -----
----- Writing log data to hd5 on frame: 2000 -----
WARNING:root:Trying to set hidden state before vertices are merged, converted to no-op
WARNING:root:Trying to set hidden state before vertices are merged, converted to no-op
WARNING:root:Trying to set hidden state before vertices are merged, converted to no-op
WARNING:root:Trying to set hidden state before vertices are merged, converted to no-op
----- Writing log data to hd5 on frame: 2200 -----
----- Writing log data to hd5 on frame: 2400 -----
----- Writing log data to hd5 on frame: 2600 -----
----- Writing log data to hd5 on frame: 2800 -----
----- Writing log data to hd5 on frame: 3000 -----
Demo was succesfully completed:  False
IG LOGGER INFO: Ending log writing session after 3000 frames
Mismatch for obj 120 with mismatched attribute joint_state starting at timestep 443
Mismatch for obj 163 with mismatched attribute joint_state starting at timestep 1
Mismatch for obj 163 with mismatched attribute orientation starting at timestep 1
Mismatch for obj 163 with mismatched attribute position starting at timestep 1538
Mismatch for obj 164 with mismatched attribute joint_state starting at timestep 2
Mismatch for obj 164 with mismatched attribute orientation starting at timestep 2
Mismatch for obj 164 with mismatched attribute position starting at timestep 1654
Mismatch for obj 165 with mismatched attribute orientation starting at timestep 2
Mismatch for obj 165 with mismatched attribute position starting at timestep 1515
Mismatch for obj 199 with mismatched attribute orientation starting at timestep 577
Mismatch for obj 199 with mismatched attribute position starting at timestep 606
Mismatch for obj 202 with mismatched attribute orientation starting at timestep 919
Mismatch for obj 202 with mismatched attribute position starting at timestep 919
Mismatch for obj 270 with mismatched attribute orientation starting at timestep 0
Mismatch for obj 270 with mismatched attribute position starting at timestep 0
Demo was deterministic:  False

I was really concerned about the obj 270, since mismatch started at timestep 0, and it proved to be the Behavior Robot itself.

Information recorded in the HDF5 file

Information recorded in the urdf file

Solution