google-deepmind / mujoco

Multi-Joint dynamics with Contact. A general purpose physics simulator.
https://mujoco.org
Apache License 2.0
8.17k stars 816 forks source link

Python viewer.launch.passive() (v 2.3.3) - Crush problem #790

Closed cidxb closed 1 year ago

cidxb commented 1 year ago

Hi, it is great that mujoco offers a passive viewer now,

but I encounter another problem with it.

When I put the viewer.launch.passive() to my testing script, loading the XML model. it will crush with this message : Segmentation fault(core dumped)

Since I didn't meet any other similar report, I assume is because there is some thing went wrong with my setting up?

saran-t commented 1 year ago

We need your script, model, and system config to troubleshoot.

cidxb commented 1 year ago

Minimal XML of the model of an open-source project MuJoCo_RL_UR5.

The mesh is available at https://github.com/PaulDanielML/MuJoCo_RL_UR5

Script

gripper_xml='/home/jeffery/Workspace/MuJoCo_RL_UR5-master/UR5+gripper/UR5gripper_2_finger.xml'

from mujoco import viewer

import mujoco as mj
model=mj.MjModel.from_xml_path(gripper_xml)
data = mj.MjData(model)
m = model
d = data
viewer.launch_passive(m,d)
for  i in range (10000):
    mujoco.mj_forward(m,d)

Error message : Segmentation fault (core dumped)

System config: OS: Ubuntu 22.04.2 LTS OS Type:64-bit Graphic: GeForce RTX 3060 Mobile/MaxQ 6G Memory: 15.7GB (16) Processor: 12th Intel@Core i7-12700H x 20 Pyhton env:3.10.8

MUJOCO_GL=GLFW (Since you might know that EGL isn't working for me.)

saran-t commented 1 year ago

I think I've identified the responsible race condition. Fix incoming.

Geryyy commented 1 year ago

I encounter the same issue with mujoco 2.3.5. It crashes with a segmentation fault.

min working example:

gripper_xml='/home/geryyy/repos/mujoco/model/tendon_arm/arm26.xml'

from mujoco import viewer

import mujoco as mj
model=mj.MjModel.from_xml_path(gripper_xml)
data = mj.MjData(model)
m = model
d = data
viewer.launch_passive(m,d)
for  i in range (10000):
    mj.mj_forward(m,d)
saran-t commented 1 year ago

@Geryyy I'll need a stack trace of your crash to troubleshoot the issue.

Geryyy commented 1 year ago

stack trace:

(gdb) bt
#0  0x00007fffbdffb640 in ?? ()
#1  0x00007fffcd08bfce in ?? ()
   from /home/geryyy/catkin_ws/src/arc/venv/lib/python3.10/site-packages/mujoco/_simulate.cpython-310-x86_64-linux-gnu.so
#2  0x00007fffcd08ba22 in ?? ()
   from /home/geryyy/catkin_ws/src/arc/venv/lib/python3.10/site-packages/mujoco/_simulate.cpython-310-x86_64-linux-gnu.so
#3  0x00007fffcd08a7ae in ?? ()
   from /home/geryyy/catkin_ws/src/arc/venv/lib/python3.10/site-packages/mujoco/_simulate.cpython-310-x86_64-linux-gnu.so
#4  0x00007fffcd08a61a in ?? ()
   from /home/geryyy/catkin_ws/src/arc/venv/lib/python3.10/site-packages/mujoco/_simulate.cpython-310-x86_64-linux-gnu.so
#5  0x00007ffff60a5de6 in ?? () from /lib/x86_64-linux-gnu/libgcc_s.so.1
#6  0x00007ffff60a64e0 in _Unwind_ForcedUnwind () from /lib/x86_64-linux-gnu/libgcc_s.so.1
#7  0x00007ffff7c9d4c6 in __GI___pthread_unwind (buf=<optimized out>) at ./nptl/unwind.c:130
#8  0x00007ffff7c95d3a in __do_cancel () at ../sysdeps/nptl/pthreadP.h:280
#9  __GI___pthread_exit (value=0x0) at ./nptl/pthread_exit.c:36
#10 0x00005555556641bd in PyThread_exit_thread ()
#11 0x00005555555c7e8b in ?? ()
#12 0x00005555556a190c in _PyEval_EvalFrameDefault ()
#13 0x00005555556b11ec in _PyFunction_Vectorcall ()
#14 0x00007ffff6ab3d51 in ?? () from /usr/lib/python3.10/lib-dynload/_ctypes.cpython-310-x86_64-linux-gnu.so
#15 0x00007ffff7e317ec in ?? () from /lib/x86_64-linux-gnu/libffi.so.8
#16 0x00007ffff7e32050 in ?? () from /lib/x86_64-linux-gnu/libffi.so.8
#17 0x00007ffff675b52b in _glfwInputError () from /home/geryyy/catkin_ws/src/arc/venv/lib/python3.10/site-packages/glfw/wayland/libglfw.so
#18 0x00007ffff675a58f in glfwMakeContextCurrent ()
   from /home/geryyy/catkin_ws/src/arc/venv/lib/python3.10/site-packages/glfw/wayland/libglfw.so
#19 0x00007fffcd0e6bfa in ?? ()
   from /home/geryyy/catkin_ws/src/arc/venv/lib/python3.10/site-packages/mujoco/_simulate.cpython-310-x86_64-linux-gnu.so
#20 0x00007fffcd0e2e24 in ?? ()
   from /home/geryyy/catkin_ws/src/arc/venv/lib/python3.10/site-packages/mujoco/_simulate.cpython-310-x86_64-linux-gnu.so
#21 0x00007fffcd0e2ce6 in ?? ()
   from /home/geryyy/catkin_ws/src/arc/venv/lib/python3.10/site-packages/mujoco/_simulate.cpython-310-x86_64-linux-gnu.so
#22 0x00007ffff68da12d in ?? ()
   from /home/geryyy/catkin_ws/src/arc/venv/lib/python3.10/site-packages/mujoco/_callbacks.cpython-310-x86_64-linux-gnu.so
#23 0x00005555556d1227 in ?? ()
#24 0x00005555556a7090 in ?? ()
#25 0x00005555556b1223 in _PyFunction_Vectorcall ()
#26 0x00005555556bf8e2 in PyObject_Call ()
#27 0x000055555569baf0 in _PyEval_EvalFrameDefault ()
#28 0x00005555556b11ec in _PyFunction_Vectorcall ()
#29 0x00005555556998cb in _PyEval_EvalFrameDefault ()
#30 0x00005555556b11ec in _PyFunction_Vectorcall ()
#31 0x00005555556998cb in _PyEval_EvalFrameDefault ()
#32 0x00005555556bee91 in ?? ()
#33 0x00005555557eae5b in ?? ()
#34 0x00005555557e0f58 in ?? ()
#35 0x00007ffff7c94b43 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#36 0x00007ffff7d26a00 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
saran-t commented 1 year ago

That's really strange, it looks like you've installed a MuJoCo callback that somehow calls into simulate.

Can you share your Python code?

Geryyy commented 1 year ago

The code is the minimal example from above.

gripper_xml='/home/geryyy/repos/mujoco/model/tendon_arm/arm26.xml'

from mujoco import viewer

import mujoco as mj
model=mj.MjModel.from_xml_path(gripper_xml)
data = mj.MjData(model)
m = model
d = data
viewer.launch_passive(m,d)
for  i in range (10000):
    mj.mj_forward(m,d)
saran-t commented 1 year ago

Please update your code to follow the example in our documentation. The API was changed in 2.3.5, specifically, you now need to hold a reference to the handle returned by viewer.launch_passive, and you also need to call sync on the handle for the updated physics to be reflected in the viewer.

Geryyy commented 1 year ago

Thanks for the update! Unfortunately, it crashes with a seg fault again. the code:

gripper_xml='/home/geryyy/repos/mujoco/model/tendon_arm/arm26.xml'

from mujoco import viewer
import time

import mujoco 
model=mujoco.MjModel.from_xml_path(gripper_xml)
data = mujoco.MjData(model)
m = model
d = data
with mujoco.viewer.launch_passive(m, d) as viewer:
  # Close the viewer automatically after 30 wall-seconds.
  start = time.time()
  while viewer.is_running() and time.time() - start < 30:
    step_start = time.time()

    # mj_step can be replaced with code that also evaluates
    # a policy and applies a control signal before stepping the physics.
    mujoco.mj_step(m, d)

    # Example modification of a viewer option: toggle contact points every two seconds.
    with viewer.lock():
      viewer.opt.flags[mujoco.mjtVisFlag.mjVIS_CONTACTPOINT] = int(d.time % 2)

    # Pick up changes to the physics state, apply perturbations, update options from GUI.
    viewer.sync()

    # Rudimentary time keeping, will drift relative to wall clock.
    time_until_next_step = m.opt.timestep - (time.time() - step_start)
    if time_until_next_step > 0:
      time.sleep(time_until_next_step)

backtrace:

hread 25 "python" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffbdffb640 (LWP 6621)]
0x00007ffff7e41627 in ?? () from /lib/x86_64-linux-gnu/libwayland-client.so.0
(gdb) bt
#0  0x00007ffff7e41627 in ?? () from /lib/x86_64-linux-gnu/libwayland-client.so.0
#1  0x00007ffff7e41705 in ?? () from /lib/x86_64-linux-gnu/libwayland-client.so.0
#2  0x00007ffff7e45902 in wl_proxy_marshal () from /lib/x86_64-linux-gnu/libwayland-client.so.0
#3  0x00007ffff6766312 in wl_buffer_destroy ()
   from /home/geryyy/catkin_ws/src/arc/venv/lib/python3.10/site-packages/glfw/wayland/libglfw.so
#4  0x00007ffff676b138 in _glfwPlatformDestroyWindow ()
   from /home/geryyy/catkin_ws/src/arc/venv/lib/python3.10/site-packages/glfw/wayland/libglfw.so
#5  0x00007ffff6761be0 in glfwDestroyWindow ()
   from /home/geryyy/catkin_ws/src/arc/venv/lib/python3.10/site-packages/glfw/wayland/libglfw.so
#6  0x00007fffcd0e6c0b in ?? ()
   from /home/geryyy/catkin_ws/src/arc/venv/lib/python3.10/site-packages/mujoco/_simulate.cpython-310-x86_64-linux-gnu.so
#7  0x00007fffcd0e2e24 in ?? ()
   from /home/geryyy/catkin_ws/src/arc/venv/lib/python3.10/site-packages/mujoco/_simulate.cpython-310-x86_64-linux-gnu.so
#8  0x00007fffcd0e2ce6 in ?? ()
   from /home/geryyy/catkin_ws/src/arc/venv/lib/python3.10/site-packages/mujoco/_simulate.cpython-310-x86_64-linux-gnu.so
#9  0x00007ffff68da12d in ?? ()
   from /home/geryyy/catkin_ws/src/arc/venv/lib/python3.10/site-packages/mujoco/_callbacks.cpython-310-x86_64-linux-gnu.so
#10 0x00005555556d1227 in ?? ()
#11 0x00005555556a7090 in ?? ()
#12 0x00005555556b1223 in _PyFunction_Vectorcall ()
#13 0x00005555556bf8e2 in PyObject_Call ()
#14 0x000055555569baf0 in _PyEval_EvalFrameDefault ()
#15 0x00005555556b11ec in _PyFunction_Vectorcall ()
#16 0x00005555556998cb in _PyEval_EvalFrameDefault ()
#17 0x00005555556b11ec in _PyFunction_Vectorcall ()
#18 0x00005555556998cb in _PyEval_EvalFrameDefault ()
#19 0x00005555556bee91 in ?? ()
#20 0x00005555557eae5b in ?? ()
saran-t commented 1 year ago

Looks like the crash occurs in glfwDestroyWindow. Does the viewer work before you try to close it?

Also, those ?? in the stack trace unfortunately aren't super informative. Would you be able to figure out which Python line is causing the crash?

Geryyy commented 1 year ago

Based on the back trace (see below) the issue is caused when the glfw window is destroyed on closing of the application. In the minimal example from above

while viewer.is_running() and time.time() - start < 30:

the program is exited after 30 seconds which causes the segmentation fault. If the glfw window is closed with the 'x' button the application closes nicely.

To get the back trace with debug symbols I built the python bindings (and mujoco) from source (commit 42a16f65c2814249f4272aefc87e8821ccd4e9ae)

back trace:

#0  0x0000000000000000 in ?? ()
#1  0x00007ffff5211b00 in glfwMakeContextCurrent ()
   from /home/geryyy/catkin_ws/src/arc/venv/lib/python3.10/site-packages/glfw/x11/libglfw.so
#2  0x00007fffb81b3507 in mujoco::GlfwAdapter::~GlfwAdapter (this=0x7fff8ca25b50, 
    __in_chrg=<optimized out>) at /tmp/pip-req-build-2umg3kz2/mujoco/simulate/glfw_adapter.cc:113
#3  0x00007fffb81b3550 in mujoco::GlfwAdapter::~GlfwAdapter (this=0x7fff8ca25b50, 
    __in_chrg=<optimized out>) at /tmp/pip-req-build-2umg3kz2/mujoco/simulate/glfw_adapter.cc:115
#4  0x00007fffb817289a in std::default_delete<mujoco::PlatformUIAdapter>::operator() (
    this=0x7fff8d3f0d98, __ptr=0x7fff8ca25b50) at /usr/include/c++/9/bits/unique_ptr.h:81
#5  0x00007fffb816abc2 in std::unique_ptr<mujoco::PlatformUIAdapter, std::default_delete<mujoco::PlatformUIAdapter> >::~unique_ptr (this=0x7fff8d3f0d98, __in_chrg=<optimized out>)
    at /usr/include/c++/9/bits/unique_ptr.h:292
#6  0x00007fffb81641a2 in mujoco::Simulate::~Simulate (this=0x7fff8d00c550, 
    __in_chrg=<optimized out>) at /tmp/pip-req-build-2umg3kz2/mujoco/simulate/simulate.h:44
#7  0x00007fffb8173498 in mujoco::python::(anonymous namespace)::SimulateWrapper::~SimulateWrapper
    (this=0x7fff8d00c550, __in_chrg=<optimized out>)
    at /tmp/pip-req-build-2umg3kz2/mujoco/simulate.cc:38
#8  0x00007fffb81734be in std::default_delete<mujoco::python::(anonymous namespace)::SimulateWrapper>::operator() (this=0x7fffb6d5c908, __ptr=0x7fff8d00c550)
    at /usr/include/c++/9/bits/unique_ptr.h:81
#9  0x00007fffb816b770 in std::unique_ptr<mujoco::python::(anonymous namespace)::SimulateWrapper, std::default_delete<mujoco::python::(anonymous namespace)::SimulateWrapper> >::~unique_ptr (
    this=0x7fffb6d5c908, __in_chrg=<optimized out>) at /usr/include/c++/9/bits/unique_ptr.h:292
#10 0x00007fffb81730e3 in pybind11::class_<mujoco::python::(anonymous namespace)::SimulateWrapper>::dealloc (v_h=...)
    at /tmp/pip-req-build-2umg3kz2/build/temp.linux-x86_64-cpython-310/_deps/pybind11-src/include/pybind11/pybind11.h:1872
#11 0x00007ffff70e7a43 in pybind11::detail::clear_instance (self=0x7fffb6d5c8f0)
    at /tmp/pip-req-build-2umg3kz2/build/temp.linux-x86_64-cpython-310/_deps/pybind11-src/include/pybind11/detail/class.h:424
#12 0x00007ffff70e7b6b in pybind11::detail::pybind11_object_dealloc (self=0x7fffb6d5c8f0)
    at /tmp/pip-req-build-2umg3kz2/build/temp.linux-x86_64-cpython-310/_deps/pybind11-src/include/pybind11/detail/class.h:457
#13 0x0000000000546772 in ?? ()
#14 0x000000000053e3e8 in ?? ()
#15 0x0000000000629945 in _PyFunction_Vectorcall ()
#16 0x00000000006294ec in PyObject_Call ()
#17 0x00000000005ac10b in _PyEval_EvalFrameDefault ()
#18 0x0000000000629910 in _PyFunction_Vectorcall ()
#19 0x00000000005a9c15 in _PyEval_EvalFrameDefault ()
#20 0x0000000000629910 in _PyFunction_Vectorcall ()
#21 0x00000000005a9c15 in _PyEval_EvalFrameDefault ()
#22 0x00000000005a87e1 in ?? ()
#23 0x0000000000548f44 in ?? ()
#24 0x00000000006295ea in PyObject_Call ()
#25 0x000000000068e04a in ?? ()
#26 0x00000000006b6798 in ?? ()
#27 0x00007ffff7d72609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#28 0x00007ffff7eac133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

This issue is minor as the viewer can still be used in passive mode.

curtiscjohnson commented 1 year ago

I am having a similar minor issue as @Geryyy above. I'm using the mujoco python bindings v2.3.5 installed via pip on Ubuntu 20.04.

The passive viewer works well until the program exits by reaching 30 seconds.

Here's a minimum working example using the one of the example humanoid mjcf files:

gripper_xml = "/home/curtis/mujoco-2.3.5/model/humanoid/humanoid.xml"

from mujoco import viewer
import time

import mujoco

model = mujoco.MjModel.from_xml_path(gripper_xml)
data = mujoco.MjData(model)
m = model
d = data
with mujoco.viewer.launch_passive(m, d) as viewer:
    # Close the viewer automatically after 30 wall-seconds.
    start = time.time()
    while viewer.is_running() and time.time() - start < 10:
        step_start = time.time()

        # mj_step can be replaced with code that also evaluates
        # a policy and applies a control signal before stepping the physics.
        mujoco.mj_step(m, d)

        # Example modification of a viewer option: toggle contact points every two seconds.
        with viewer.lock():
            viewer.opt.flags[mujoco.mjtVisFlag.mjVIS_CONTACTPOINT] = int(d.time % 2)

        # Pick up changes to the physics state, apply perturbations, update options from GUI.
        viewer.sync()

        # Rudimentary time keeping, will drift relative to wall clock.
        time_until_next_step = m.opt.timestep - (time.time() - step_start)
        if time_until_next_step > 0:
            time.sleep(time_until_next_step)