Closed breakds closed 1 year ago
You indicated that you built from source. As with all questions relating to performance, I'd strongly urge you to get baseline numbers from our official binary build first.
EDIT: Looks like @yuvaltassa just posted an actual answer, so please ignore me. (Although I just ran the official build of testspeed on my own 5950x machine and got a bit higher throughput than what you posted. I suspect you're missing the INTERPROCEDURAL_OPTIMIZATION flag.)
Hi, thanks for your questions!
Question 1: I apologize, I believe the line in the paper is a (rather serious) mistake. "on a single CPU thread" should read "on a single CPU". Note however that real-time ratios scale with the timestep. The humanoid model in the MuJoCo repository has a conservative 5ms timestep, but the one used by the MJPC planner has a timestep of 15ms which leads to a total real-time ratio of 4000x on an M1 Mac and 5400x on your machine. Of course large timesteps are "dangerous" in the sense that the physics can sometimes diverge, but when used in the context of short planner trajectories (that handle divergence gracefully), this is not a problem. The humanoid environment controlled by the MJPC planner actually has the default timestep of 2ms, for high stability and fidelity. Such are the benefits of asynchronous MPC, you can choose timesteps to suit your needs 🙂
Question 2: I think I can see multiple issues in the timing breakdown.
Very generally, MuJoCo has a built in profiler that is visible in the simulate
app GUI (F3 key), that will show you how different choices affect compute speed (this is printed out in testspeed
). Use it!
Allow me to apologize again for the typo in the paper, it is rather embarrassing 😬
EDIT: Just realised you did attach a model. I will look more carefully tomorrow, but your 3.7MB height-field data file leads me to think that it is much denser than in needs to be.
Really appreciate your fast response!
I suspect you're missing the INTERPROCEDURAL_OPTIMIZATION flag.)
Thanks! This did boost the realtime factor from 1833 to 2200. Is this close to what you have @saran-t ?
@yuvaltassa Thanks a lot for the clarification on the MuJoCo MPC paper. The paper is very nice and the various benefits of using MuJoCo for planner rollout looks like a good fit.
On Question 2:
I find it curious that you have so many contacts. You say you have a quadruped on some terrain? How do you get to an average of 40 contacts / step? You have lots of geoms on the feet or something? Are you using a very dense height-field? (don't do that).
Unfortunately yes, we do have a very dense height field, with 1600 x 1600 vertex in 12m x 12m ground. The reason we have this is that since we are training a policy with model-free reinforcement learning, we need a relatively dense terrain to encourage the learned locomotion policy to lift feet hand enough.
I find the constraint/contact ratio curious. I think you must be using 6D contacts? These are not free...
Yes we set condim="6"
on all feet based on code from mujoco-menagerie. When lower this down to "3", the simulation is indeed 20% faster. I am not sure whether this will affect sim2real at this moment.
I find the 77% time spent in constraint resolution too high. Try switching solver and/or setting jacobian = "sparse".
Using "CG" solver did help double the simulation performance. Setting "jacobian" to "sparse" does not seem to further improve the performance.
In version 2.3.3 we introduced a new mid-phase pruning stage that significantly enhanced collision detection speed (credit to @quagla!).
I noticed this when 2.3.3 was released (really appreciate your effort on actively developing it and regularly making new releases!). The algorithm was clever. I understand that currently it does not seem to work on heightfields - but in the end we just want some terrain with irregular ups and downs. Do you think it would be better if instead we create the terrain with a bunch of cubes on the ground which cannot move?
Very generally, MuJoCo has a built in profiler that is visible in the simulate app GUI (F3 key) Nice. This is very useful. Thank you!
Thanks again for your time and patiently answer the questions!
So the only unresolved problem (and the biggest) is that we have a rather dense hfield
. We need this to allow the training to get a policy where the quadruped can lift its legs higher.
In terms of performance, instead of using hfield
, is it going to be faster if we replace the terrain with (a huge amount of) blocks, so that the binary tree search trick can kick in?
Another trick I can think of is to place 20 robots on a single terrain (i.e. a model with 20 robots) for training. I can carefully assign the contype
and conaffinity
to make those robots "disjoint" for conflict detection purpose. This seems to increase the "per-robot" performance a lot, but I think this might not be recommended and can even have harmful side effects that I am not aware of.
Replacing the hfield with primitives should help, though I suspect that you might be able to get away with a somewhat less dense hfield.
I can't think of side-effects of making a multi-robot scene, this should be fine.
Closing for now but will keep following this thread.
Hi,
Greetings! We have recently been utilizing MuJoCo for our project involving the simulation of a Unitree Go1 quadrupedal robot navigating across varied terrains. However, we're facing some performance bottlenecks during our training cycles. Consequently, we're seeking your insights and advice on potential ways to enhance the simulation speed.
Below, I've provided some context and specifics from our experiment conducted on an AMD Ryzen 9 5950x CPU, which has 16 cores and 32 threads.
Question 1: Discrepancies in Simulation Speed Factors
The MuJoCo MPC paper indicates that a single CPU thread can simulate a 27-DoF humanoid system 4000 times faster than real-time.
To replicate this performance, we ran
testspeed
on the humanoid model and achieved a real:sim factor of roughly 150 times for a single thread. Although this surpasses the speed reported in post #897, it falls far short of the 4000x speed cited in the paper.Can anyone guide us on how we can approach this 4000x magnitude? Are there any specific configurations, compilation flags, or model modifications that we need to consider? We are building MuJoCo from the source and any advice regarding this matter would be highly appreciated!
Below is the build log when I build MuJoCo from source. AVX should be enabled by default.
Build Log for MuJoCo
``` mujoco> unpacking sources mujoco> unpacking source archive /nix/store/wxivcwsqz6zylr31hnfx1cckbwl8wr3i-source mujoco> source root is source mujoco> patching sources mujoco> applying patch /nix/store/fpka17kjcczv7zc1cmq4j7p7dpf49pcp-dependencies.patch mujoco> patching file cmake/MujocoDependencies.cmake mujoco> patching file simulate/cmake/SimulateDependencies.cmake mujoco> configuring mujoco> fixing cmake files... mujoco> cmake flags: -DCMAKE_FIND_USE_SYSTEM_PACKAGE_REGISTRY=OFF -DCMAKE_FIND_USE_PACKAGE_REGISTRY=OFF -DCMAKE_EXPORT_NO_PACKAGE_REGISTRY=ON -DCMAKE_BUILD_TYPE=Release -DBUILD_TESTING=OFF -DCMAKE_INSTALL_LOCALEDIR=/nix/store/hj2d8zyd3yjb5n6s5m1m1k22i5rbniwh-mujoco-2.3.6/share/locale -DCMAKE_INSTALL_LIBEXECDIR=/nix/store/hj2d8zyd3yjb5n6s5m1m1k22i5rbniwh-mujoco-2.3.6/libexec -DCMAKE_INSTALL_LIBDIR=/nix/store/hj2d8zyd3yjb5n6s5m1m1k22i5rbniwh-mujoco-2.3.6/lib -DCMAKE_INSTALL_DOCDIR=/nix/store/hj2d8zyd3yjb5n6s5m1m1k22i5rbniwh-mujoco-2.3.6/share/doc/mujoco -DCMAKE_INSTALL_INFODIR=/nix/store/hj2d8zyd3yjb5n6s5m1m1k22i5rbniwh-mujoco-2.3.6/share/info -DCMAKE_INSTALL_MANDIR=/nix/store/hj2d8zyd3yjb5n6s5m1m1k22i5rbniwh-mujoco-2.3.6/share/man -DCMAKE_INSTALL_OLDINCLUDEDIR=/nix/store/hj2d8zyd3yjb5n6s5m1m1k22i5rbniwh-mujoco-2.3.6/include -DCMAKE_INSTALL_INCLUDEDIR=/nix/store/hj2d8zyd3yjb5n6s5m1m1k22i5rbniwh-mujoco-2.3.6/include -DCMAKE_INSTALL_SBINDIR=/nix/store/hj2d8zyd3yjb5n6s5m1m1k22i5rbniwh-mujoco-2.3.6/sbin -DCMAKE_INSTALL_BINDIR=/nix/store/hj2d8zyd3yjb5n6s5m1m1k22i5rbniwh-mujoco-2.3.6/bin -DCMAKE_INSTALL_NAME_DIR=/nix/store/hj2d8zyd3yjb5n6s5m1m1k22i5rbniwh-mujoco-2.3.6/lib -DCMAKE_POLICY_DEFAULT_CMP0025=NEW -DCMAKE_OSX_SYSROOT= -DCMAKE_FIND_FRAMEWORK=LAST -DCMAKE_STRIP=/nix/store/d9fndiing52fkalp5knfalrvlb3isi6w-gcc-wrapper-12.2.0/bin/strip -DCMAKE_RANLIB=/nix/store/d9fndiing52fkalp5knfalrvlb3isi6w-gcc-wrapper-12.2.0/bin/ranlib -DCMAKE_AR=/nix/store/d9fndiing52fkalp5knfalrvlb3isi6w-gcc-wrapper-12.2.0/bin/ar -DCMAKE_C_COMPILER=gcc -DCMAKE_CXX_COMPILER=g++ -DCMAKE_INSTALL_PREFIX=/nix/store/hj2d8zyd3yjb5n6s5m1m1k22i5rbniwh-mujoco-2.3.6 mujoco> -- The C compiler identification is GNU 12.2.0 mujoco> -- The CXX compiler identification is GNU 12.2.0 mujoco> -- Detecting C compiler ABI info mujoco> -- Detecting C compiler ABI info - done mujoco> -- Check for working C compiler: /nix/store/d9fndiing52fkalp5knfalrvlb3isi6w-gcc-wrapper-12.2.0/bin/gcc - skipped mujoco> -- Detecting C compile features mujoco> -- Detecting C compile features - done mujoco> -- Detecting CXX compiler ABI info mujoco> -- Detecting CXX compiler ABI info - done mujoco> -- Check for working CXX compiler: /nix/store/d9fndiing52fkalp5knfalrvlb3isi6w-gcc-wrapper-12.2.0/bin/g++ - skipped mujoco> -- Detecting CXX compile features mujoco> -- Detecting CXX compile features - done mujoco> -- Performing Test CAN_BUILD_AVX mujoco> -- Performing Test CAN_BUILD_AVX - Success mujoco> -- Performing Test SUPPORTS_LLD mujoco> -- Performing Test SUPPORTS_LLD - Failed mujoco> -- Performing Test SUPPORTS_GOLD mujoco> -- Performing Test SUPPORTS_GOLD - Success mujoco> -- Performing Test SUPPORTS_GC_SECTIONS mujoco> -- Performing Test SUPPORTS_GC_SECTIONS - Success mujoco> -- mujoco::FindOrFetch: checking for targets in package `qhull` mujoco> -- mujoco::FindOrFetch: checking for targets in package `qhull` - target `qhull` not defined. mujoco> -- mujoco::FindOrFetch: Using FetchContent to retrieve `qhull` mujoco> -- Looking for sys/types.h mujoco> -- Looking for sys/types.h - found mujoco> -- Looking for inttypes.h mujoco> -- Looking for inttypes.h - found mujoco> -- Looking for stddef.h mujoco> -- Looking for stddef.h - found mujoco> -- Looking for stdint.h mujoco> -- Looking for stdint.h - found mujoco> -- Check size of off_t mujoco> -- Check size of off_t - done mujoco> -- Looking for fseeko mujoco> -- Looking for fseeko - found mujoco> -- Looking for ftello mujoco> -- Looking for ftello - found mujoco> -- Looking for PRIdMAX mujoco> -- Looking for PRIdMAX - found mujoco> -- mujoco> -- ========== qhull Build Information ========== mujoco> -- Build Version: 8.1-alpha1 mujoco> -- Install Prefix (CMAKE_INSTALL_PREFIX): /nix/store/hj2d8zyd3yjb5n6s5m1m1k22i5rbniwh-mujoco-2.3.6 mujoco> -- Binary Directory (BIN_INSTALL_DIR): bin mujoco> -- Library Directory (LIB_INSTALL_DIR): lib mujoco> -- Include Directory (INCLUDE_INSTALL_DIR): include mujoco> -- Documentation Directory (DOC_INSTALL_DIR): share/doc/qhull mujoco> -- Man Pages Directory (MAN_INSTALL_DIR): share/man/man1 mujoco> -- CMake Directory (CMAKE_INSTALL_DIR): lib/cmake/QHull mujoco> -- PkgConfig Directory (PKGCONFIG_INSTALL_DIR):lib/pkgconfig mujoco> -- Build Type (CMAKE_BUILD_TYPE): Release mujoco> -- Build static libraries: ON mujoco> -- Build shared library: OFF mujoco> -- Use shared library for linking apps: OFF mujoco> -- Build tests: OFF mujoco> -- To override these options, add -D{OPTION_NAME}=... to the cmake command mujoco> -- Build the debug targets -DCMAKE_BUILD_TYPE=Debug mujoco> -- mujoco> -- To build and install qhull, enter "make" and "make install" mujoco> -- To smoketest qhull, enter "ctest" mujoco> -- mujoco> -- mujoco::FindOrFetch: Using FetchContent to retrieve `qhull` - Done mujoco> -- mujoco::FindOrFetch: checking for targets in package `tinyxml2` mujoco> -- mujoco::FindOrFetch: checking for targets in package `tinyxml2` - target `tinyxml2` not defined. mujoco> -- mujoco::FindOrFetch: Using FetchContent to retrieve `tinyxml2` mujoco> -- mujoco::FindOrFetch: Using FetchContent to retrieve `tinyxml2` - Done mujoco> -- mujoco::FindOrFetch: checking for targets in package `tinyobjloader` mujoco> -- mujoco::FindOrFetch: checking for targets in package `tinyobjloader` - target `tinyobjloader` not defined. mujoco> -- mujoco::FindOrFetch: Using FetchContent to retrieve `tinyobjloader` mujoco> CMake Deprecation Warning at build/_deps/tinyobjloader-src/CMakeLists.txt:5 (cmake_minimum_required): mujoco> Compatibility with CMake < 2.8.12 will be removed from a future version of mujoco> CMake. mujoco> Update the VERSION argumentQuestion 2:
For our specific Go1 robot and terrain map setting (see the model here), the simulation speed dramatically decreases compared to the humanoid model.
This MJCF model works adequately for policy training, but the simulation speed is considerably slower, which hinders our process. The terrain map consists of 1600 x 1600 points. Are there any suggestions on improving the performance for this specific scenario?
Thank you for your time and consideration. Your insights and assistance are greatly valued!