google-deepmind / mujoco

Multi-Joint dynamics with Contact. A general purpose physics simulator.
https://mujoco.org
Apache License 2.0
7.57k stars 749 forks source link

Improved MuJoCo Simulation Scaling #1671

Open bd-mdelasa opened 2 months ago

bd-mdelasa commented 2 months ago

Hello MuJoCo Team,

We are currently exploring the creation of complex simulation environments where robots interact with numerous contact objects. A representative example is a robot navigating a space filled with many objects (e.g., boxes) that may be moved or interacted with. These objects remain mostly static unless the robot interacts with them while performing specific tasks.

We've observed that the real-time rate (ratio of wall clock time to simulation time) decreases sharply as the number of contact boxes increases. With a couple hundred contact objects, performance can drop quite steeply. Since we noticed the MuJoCo team was working on the lock joint effort (#7), we’ve been experimenting with adding welded boxes to the world for static objects. In the scheme we explored, objects remained locked unless they were close to the robot; this improved performance but we still noticed some overhead that increased linearly based on the number of objects (possibly from traversing some parts of the “scene graph”). Both issues represent significant challenges as we scale use of MuJoCo to different use cases.

Are there any efforts in the area of scaling MuJoCo to larger scenes? This feels like a critical area of investment that would further strengthen MuJoCo. In the meantime, any guidance you can provide on how to better manage or optimize larger worlds would be greatly appreciated.

Thanks in advance - and keep up the great work!

yuvaltassa commented 1 month ago

Hi @bd-mdelasa,

First a meta-comment: this sort of detailed request from valued users is exactly what we need in order to correctly prioritize our roadmap.

Addressing your high-level question

Are there any efforts in the area of scaling MuJoCo to larger scenes?

The answer is a resounding "yes", as follows.

  1. At the compiler level, we are in late stages of designing a programmatic API to manipulate models. You can inspect the (unfinished, currently private) API here. The gist of the API is the introduction of a new high-level struct called mjSpec which is essentially a programmatic equivalent of an XML model description. Instead of the current workflow XML → mjModel, the new workflow will look like XML → mjSpec → (edit, add, remove elements) → mjModel. So we still have all the benefits of model compilation, but we keep the mjSpec object as a live, editable thing that can always be recompiled. By not touching things that have not changed (incremental compilation), making a new mjModel will be very fast, after the first compile. This should allow you to manually "freeze" subsets of dofs that are no longer relevant for the scene in progress. We are currently finalizing this API, which will of course also be accessible via the Python bindings.
  2. At the engine level, the main items are:
    • Fix some memory allocation issues with the Newton solver. This will both give us some speed-up from better cache friendliness but most importantly allow the Newton solver to handle tens of thousands of contacts/dofs.
    • Enable contact islands for the Newton solver, which will allow multithreading within a single scene. After this is done contacts islands will likely be on by default.
  3. Once both 1 and 2 are finished, we will add dof freezing functionality at the island level. We are not yet sure if this will be enabled as a runtime option (without changing the mjModel) or at the mjSpec API level, effectively "recompile the model, removing all dofs of island X".

The workstream in item 1 is already actively being worked on. Item 2 was recently up-prioritized following your feature request. We expect 1 and 2 to be complete some time during Q3 24'.

Regarding your specific question regarding welding boxes, I'm a bit surprised that this is not working as expected. First of all by "welding" I hope you mean removing the DoFs, not adding a "weld" constraint, yes? Assuming the answer is "yes", please try replacing the enclosing bodies with frames. These are similar to bodies but cheaper. Basically all static (dofless) bodies should be frames, rather than bodies. If you have followup question regarding this specific workflow, please open a new issue, since I'd like to keep this issue, along with #7, to track the high-level large-scene feature set.

Cheers!

bd-mdelasa commented 1 month ago

Hi @yuvaltassa

Thanks for the prompt response and for considering our request.

All the items you mention feel like steps in the right direction.

We'll definitely be able to make use of the mjSpec API you're describing. At the moment, we jump through a few hoops to achieve something similar (with some speed penalties). The Newton solver optimization you mention also sound like a huge wins, especially if they allow improved simulator scaling. Let us know if you're looking for some early adopters/testers.

RE your suggestions to replace bodies with frames - this turned out to be a great tip and fixed the performance issues I described. Since we are creating the mjModel from a string representation we are managing, we used the geom type (vs. the frame) - but this effectively achieved a similar result with nice performance gains.

Thanks again.