add a benchmark script for simple predictive sampling example
add tensorboard-related dependencies to pyproject.toml
Profiling results:
Takeaways from the above results:
the solver takes about 52% of the total runtime.
About 19% of the total runtime is spent on 4 line searches per solver iteration. The biggest "bang for buck" in terms of speeding up step is simply to use fewer solver iterations and line search iterations.
About 5.9% of the total runtime is in Cholesky factorizations and solves.
the _position subroutine of the forward dynamics takes about 26% of the total runtime!
About 4.5% of the total runtime is in this line, which is an example of inefficient multiplications of structured matrices.
the _acceleration subroutine takes about 11% of the total runtime
About 4.5% of the total runtime is in this function, which seems harder to optimize.
About 11% of the total runtime is in semi-implicit Euler integration.
Essentially 100% of the runtime of forward is captured by the above operations, which should give some insight into code optimization prioritizes on the mjx side if we want to give some feedback. We should remember that the contact engine is DISABLED for this benchmark.
Tips on profiling
tensorboard
-related dependencies topyproject.toml
Profiling results:
Takeaways from the above results:
step
is simply to use fewer solver iterations and line search iterations._position
subroutine of the forward dynamics takes about 26% of the total runtime!_acceleration
subroutine takes about 11% of the total runtimeEssentially 100% of the runtime of
forward
is captured by the above operations, which should give some insight into code optimization prioritizes on themjx
side if we want to give some feedback. We should remember that the contact engine is DISABLED for this benchmark.