google-deepmind / mujoco

Multi-Joint dynamics with Contact. A general purpose physics simulator.
https://mujoco.org
Apache License 2.0
8.28k stars 827 forks source link

Best practices for fast rendering with MJX #1604

Closed c0g closed 2 months ago

c0g commented 7 months ago

Hi,

I'm a engineer and I'm trying to use MuJoCo for fun.

I'm looking for some help with rendering simulations in MJX. My model is extremely simple so this might be part of the issue, e.g. I'm just not doing enough compute (model at the bottom of this post).

I've verified that I'm definitely using EGL. The time to render a single scene is fine, it's my method for render all 4096 scenes that's slow.

My current sim code:

class Render:
    def __init__(self, model):
        self.model = model
        self.renderer = mujoco.Renderer(model)

    def __call__(self, rng, batch):
        data = mjx.get_data(self.model, batch)
        for i, d in enumerate(data):
            self.renderer.update_scene(d)
            pixels = self.renderer.render()
            dataset.image(i, d.time, pixels)
        return rng, batch

When I have a lot of parallel sims, the copy and for loop can take a good amount of time - I can mitigate this slightly using a background thread especially if I'm simulating at 1 khz and rendering at 10 hz. Is there a way I can avoid doing this copy and loop? It feels like it should be possible since the data is on the GPU which is doing the rendering.

Model:

<mujoco>
  <option timestep=".001">
  </option>
  <asset>
    <texture name="grid" type="2d" builtin="checker" rgb1=".1 .2 .3" rgb2=".2 .3 .4" width="640" height="480"/>
    <material name="grid" texture="grid" texrepeat="8 8" reflectance=".2"/>
  </asset>
  <worldbody>
    <geom name="floor" type="plane" pos="0 0 -5" size="20 20 .1" material="grid"/>
    <light pos="0 -.4 1"/>
    <body name="oleg" pos="0 0 0">
      <body name="thigh" pos="0 0 0">
        <joint type="slide" axis="1 0 0" pos="0 0 0" limited="true" range="-10 10"/>
        <joint type="slide" axis="0 0 1" pos="0 0 0" limited="true" range="-10 0"/>
        <joint name="thigh_joint" type="hinge" axis="0 -1 0" pos="0 0 6" limited="true" range="-60 60"/>
        <joint type="hinge" axis="-1 0 0" pos="0 0 6" limited="true" range="-5 5"/>
        <geom pos="0 0 4" type="box" size="0.2 0.1 2.0" rgba="1 1 0 1" mass="0.5"/>
        <body name="calf" pos="0 0 0">
          <joint name="knee_joint" axis="0 -1 0" pos="0 0 2"/>
          <geom pos="0 0 0" type="box" size="0.2 0.1 2.0" rgba="1 0 0 1" mass="0.3"/>
        </body>
      </body>
    </body>
  </worldbody>
  <actuator>
    <motor name='thigh_motor' ctrllimited="true" ctrlrange="-100.0 100.0" joint="thigh_joint"/>
    <motor name='knee_motor' ctrllimited="true" ctrlrange="-100.0 100.0" joint="knee_joint"/>
  </actuator>
</mujoco>
erikfrey commented 7 months ago

This is a good question. You're right - the culprit here is mjx.get_data - any scenario where you are copying data to host and then back to GPU to render is going to kill throughput.

Doing this right requires building interfaces to allow MJX and a renderer to share the same GPU buffers. The most promising avenue for this is Madrona, and we are experimenting with them on ways to integrate to MJX, but it's early days.

We are also working on an simple depth renderer that would be native to MJX using mjx.ray, however I suspect the Madrona integration to be both faster and more powerful. Stay tuned, this is actively under development.

c0g commented 6 months ago

Thanks for the answer :-) glad to hear it’s being worked on. Anything early I can play with?

Andrew-Luo1 commented 6 months ago

I have a depth camera implementation using mjx.ray from a personal project, that I discuss here. It renders quite quickly and I've successfully used it for vision-based policy learning (not quite RL; it uses Analytical Policy Gradients so it only requires <100 envs)