google-deepmind / mujoco

Multi-Joint dynamics with Contact. A general purpose physics simulator.
https://mujoco.org
Apache License 2.0
7.91k stars 791 forks source link

will there be the support for GPU executing? #85

Closed im-Kitsch closed 1 year ago

im-Kitsch commented 2 years ago

Hi,

it's amazing that mujoco is open-source now. I wanna to ask that if there will be GPU support in the future? It would be pretty nice and exciting.

Thanks in advance.

MotorCityCobra commented 2 years ago

Using the GPU for what?
When I open an XML in the simulate program I see the GPU is using it.
Are you using Nvidia? Start a MuJoCo program and type nvidia-smi in any command line. Do you see the path to the MuJoCo process?

Screenshot 2021-12-02 170104

aseembits93 commented 2 years ago

I think what the OP means is that the simulations should happen on the GPU. Not sure if that's a priority for the development team.

aditya-shirwatkar commented 2 years ago

Yeah, it will be awesome if the devs develop something like Nvidia's Isaac Gym

joehays commented 2 years ago

Yeah, it will be awesome if the devs develop something like Nvidia's Isaac Gym

Our group has seen 20+ speed up by using Isaac Gym versus Mujoco for RL training. The main reason is the GPU acceleration and an implementation that minimizes CPU-GPU communcation. By pushing "everything" to the GPU the CPU-GPU communication goes away and the total system performance jumps. This is all enabled by Isaac Gym providing tensor interfaces to PhysX (which runs on the GPU).

If mujoco devs move mujoco to the GPU make sure you enable a tensor interface so pyTorch-based RL seemlessly communicates with the dynamics engine on the GPU and no CPU comms are needed.

nik7273 commented 2 years ago

+1! I'd rather not switch to Isaac, but with the performance boost that Isaac promises, it's hard to say no

nimrod-gileadi commented 2 years ago

How are you running MuJoCo?

The MuJoCo C library is fast, but unfortunately, stepping environments through Python is indeed very slow. The way to get performance out of MuJoCo is to:

  1. Run many environments in parallel, to max out your CPU cores.
  2. Do as much of your run loop as possible in C/C++ rather than Python.
  3. If you can evaluate your policies on CPU in the same thread as stepping the physics, run many environments independently without batching.
  4. If you need to run your policy on an accelerator, make sure to batch and use a large number of environments.

Python does not let you do real parallelisation, because of the Global Interpreter Lock.

envpool is one project offering C++ implementations of common RL environments with a batched Python API. Their benchmark results show that MuJoCo performance is in the same ballpark as Isaac Gym.

We also posted some numbers, which you should be able to reproduce with the testspeed binary we released, in our blog post: https://www.deepmind.com/blog/open-sourcing-mujoco

Regarding efficiently running neural networks on CPU, I don’t have an open source project to point you at, but that’s the direction you should go towards for small networks (e.g. MLP with layer sizes 512, 256, 128).

sliptime commented 2 years ago

#