google-deepmind / mujoco

Multi-Joint dynamics with Contact. A general purpose physics simulator.
https://mujoco.org
Apache License 2.0
8.2k stars 819 forks source link

Crash of the simulation due to signed distance field #1539

Closed RafaelsNeurons closed 6 months ago

RafaelsNeurons commented 7 months ago

Hello,

I am a PhD Student and I want to use Mujoco to simulate assembly processes, therefore I used the Signed Distance Field Plugin.

Unfortunately, the simulation crashes. I thought it was due to the MaxGeom, so I increased the value from 20,000 to 5,000,000 in Simulate.h and created it from source.. (as mentioned in https://github.com/google-deepmind/mujoco/issues/1503#issuecomment-1994543143) That's why the Mujoco version in the video is 3.1.4.

Unfortunately, the change did not have the desired effect and my simulation continues to crash. (see provided video)

The XML file can be used to reproduce the error. Issue_SDF.zip

quagla commented 7 months ago

So nice when there is a self-contained zip of the issue :)

Unfortunately, I can't reproduce the issue.. I played a little dragging the objects and making them collide but it doesn't crash for me. Do you get an error log file generated? What is the message?

Also, your geometry is very regular, if you make your own SDF plugin it could be more precise and faster (and for sure much less memory-intensive) than using sdflib. But it may require some work.

RafaelsNeurons commented 7 months ago

Thank you very much for the quick answer!

It happens to me especially when I move the one component as in the video. If I simulate it via the Simulation.exe in the bin folder, no error message is generated and no new message appears in the MUJOCO_LOG.txt.

When simulating via Python, I don't get an error message in PyCharm either.

I use Mujoco with Windows with Python. Which system are you using?

What would be the procedure for writing your own sdf plugin? Sorry for the question.

quagla commented 7 months ago

I use simulate on Mac. Not sure how to help, since you are building from source you could try to compile in debug mode but running this example will probably take ages..

There are no bad questions :) The procedure would be to implement your own distance function as done by the plugins, e.g. https://github.com/google-deepmind/mujoco/blob/main/plugin/sdf/bowl.cc#L28

This is a useful resource for all the operations between SDFs https://iquilezles.org/articles/distfunctions/ in general things like spheres, cylinders, boolean operations, rounding, extrusions etc. are easy in the SDF world.

RafaelsNeurons commented 7 months ago

Unfortunately, I have to rely on a solution for more complex components, such as in my zip file. My goal is to derive the assembly sequence of real assemblies from the 3D models. Therefore I thought that the best way is to use the AABB as in the cow example

I have also tried the normal pip install mujoco version, but I get the same error.

I will try again with Linux. :)

RafaelsNeurons commented 7 months ago

Test_Mac_M2.mov.zip I have now also tested it on my Mac mini M2 and it works perfectly. But I was able to determine at what moment Windows crashes compared to the Mac version. If I display the "SDF Iters" on the Mac version, the components become invisible at the point where Windows would crash. When I move the component back, the components appear again. Maybe this will help you to fix the error in the Windows version or is at least a clue :) (see also video in the attachment)

I will now also test it on Linux and give feedback.

UPDATE: It also works on Linux and looks exactly like the video I recorded on the Mac Mini M2 :)

I have seen that whenever the geometries disappear as shown in the video, this warning appears in the terminal:

WARNING: The pre-allocated visual geometry buffer is full. Increase maxgeom above 20000. time = 2.6790.

WARNING: The pre-allocated visual geometry buffer is full. Increase maxgeom above 20000. time = 2.0610.

WARNING: Pre-assigned visual geometry buffer is full. Increase maxgeom above 20000. time = 0.4750.

So it seems to be a Windows Only bug.

RafaelsNeurons commented 7 months ago

Segmentation_fault_core_dumped_Linux_VMware.zip After playing around a bit in Linux, I noticed that after a while the simulation crashes and I get the error "Segmentation fault (core dumped)". (see attached video)

I run Linux on a virtual machine on my Windows computer. Could this be related to this?

I tried to fix it with setting up the memory in the XML to 1G, but also with this the simulation crashed resulting in the terminal message: double free or corruption (out) Aborted (core dumped)

quagla commented 7 months ago

Very interesting, so it seems that Windows is not able to cope with the full buffer warning. I'll not this down on our internal bug list.

For the Linux error, is there any way for you to compile in debug mode so we get perhaps a trace of where this happens? If not possible, I'll try at some point to load your files and have a look.

RafaelsNeurons commented 7 months ago

I will try to run it in Debug Mode in Linux ASAP Have never done it before but I will try :) Thanks for noting the bug!

RafaelsNeurons commented 7 months ago

Issue_SDF_Debug

We have debugged the simulation and reproduced the error in Linux. There seems to be an error in sdf.cc when adding another SDF point using the function SdfVisualiser::AddPoint. (see attached image of the terminal window)

My first guess would be that maybe the variable niter has to be increased (but I could not see a fixed defined value) Or an if query that no longer allows new points when n_iter is full. But thats just my guess.

Ga3ta commented 7 months ago

We found that it is a visualization error, so the Plugin tries to add more visualization points than there is space and ends up crashing. Adding this small if clause to the code seems to stop it from stopping and the simulator keeps detecting collisions, but there are still some view points that remain like "sticky faults".

void SdfVisualizer::AddPoint(const mjtNum point[3]) { if (!npoints.empty()) { if(3*npoints.back()+2 < points.size()){ points[3npoints.back()+0] = point[0]; points[3npoints.back()+1] = point[1]; points[3*npoints.back()+2] = point[2]; npoints.back()++; } } } HotFix.zip

quagla commented 7 months ago

Thanks for finding the cause! I'll look into this asap.

RafaelsNeurons commented 6 months ago

Can you give me a rough estimate of whether it will already be integrated in the May release? :)

I hope that this will also solve the error in the Windows version, but we will see :D

quagla commented 6 months ago

Apologies I haven't had time so far. The fix makes sense and I will commit it now, but I'm not sure why you get old points? Are you calling Reset() in your sdf class?