We present HumanoidBench, a simulated humanoid robot benchmark consisting of $15$ whole-body manipulation and $12$ locomotion tasks. This repo contains the code for environments and training.
Structure of the repository:
data
: Weights of the low-level skill policiesdreamerv3
: Training code for dreamerv3humanoid_bench
: Core benchmark code
assets
: Simulation assetsenvs
: Environment filesmjx
: MuJoCo MJX training codejaxrl_m
: Training code for SACppo
: Training code for PPOtdmpc2
: Training code for TD-MPC2Create a clean conda environment:
conda create -n humanoidbench python=3.11
conda activate humanoidbench
Then, install the required packages:
# Install HumanoidBench
pip install -e .
# jax GPU version
pip install "jax[cuda12]==0.4.28"
# Or, jax CPU version
pip install "jax[cpu]==0.4.28"
# Install jaxrl
pip install -r requirements_jaxrl.txt
# Install dreamer
pip install -r requirements_dreamer.txt
# Install td-mpc2
pip install -r requirements_tdmpc.txt
h1hand-walk-v0
h1hand-reach-v0
h1hand-hurdle-v0
h1hand-crawl-v0
h1hand-maze-v0
h1hand-push-v0
h1hand-cabinet-v0
h1strong-highbar_hard-v0
# Make hands stronger to be able to hang from the high barh1hand-door-v0
h1hand-truck-v0
h1hand-cube-v0
h1hand-bookshelf_simple-v0
h1hand-bookshelf_hard-v0
h1hand-basketball-v0
h1hand-window-v0
h1hand-spoon-v0
h1hand-kitchen-v0
h1hand-package-v0
h1hand-powerlift-v0
h1hand-room-v0
h1hand-stand-v0
h1hand-run-v0
h1hand-sit_simple-v0
h1hand-sit_hard-v0
h1hand-balance_simple-v0
h1hand-balance_hard-v0
h1hand-stair-v0
h1hand-slide-v0
h1hand-pole-v0
h1hand-insert_normal-v0
h1hand-insert_small-v0
python -m humanoid_bench.test_env --env h1hand-walk-v0
# Define checkpoints to pre-trained low-level policy and obs normalization
export POLICY_PATH="data/reach_two_hands/torch_model.pt"
export MEAN_PATH="data/reach_two_hands/mean.npy"
export VAR_PATH="data/reach_two_hands/var.npy"
# Test the environment
python -m humanoid_bench.test_env --env h1hand-push-v0 --policy_path ${POLICY_PATH} --mean_path ${MEAN_PATH} --var_path ${VAR_PATH} --policy_type "reach_double_relative"
# One-hand reaching
python -m humanoid_bench.mjx.mjx_test --with_full_model
# Two-hand reaching
python -m humanoid_bench.mjx.mjx_test --with_full_model --task=reach_two_hands --folder=./data/reach_two_hands
As a default, the environment returns a privileged state of the environment (e.g., robot state + environment state). To get proprio, visual, and tactile sensing, set obs_wrapper=True
and accordingly select the required sensors, e.g. sensors="proprio,image,tactile"
. When using tactile sensing, make sure to use h1touch
in place of h1hand
.
Full test instruction:
python -m humanoid_bench.test_env --env h1touch-stand-v0 --obs_wrapper True --sensors "proprio,image,tactile"
In addition to the main benchmark tasks listed above, you can run the following environements that feature the robot without hands:
h1-walk-v0
h1-reach-v0
h1-hurdle-v0
h1-crawl-v0
h1-maze-v0
h1-push-v0
h1-highbar_simple-v0
h1-door-v0
h1-truck-v0
h1-basketball-v0
h1-package-v0
h1-stand-v0
h1-run-v0
h1-sit_simple-v0
h1-sit_hard-v0
h1-balance_simple-v0
h1-balance_hard-v0
h1-stair-v0
h1-slide-v0
h1-pole-v0
The robot with low-dimensional hands:
h1simplehand-pole-v0
And the Unitree G1 robot with three-finger hands:
g1-walk-v0
g1-reach-v0
g1-hurdle-v0
g1-crawl-v0
g1-maze-v0
g1-push-v0
g1-cabinet-v0
g1-door-v0
g1-truck-v0
g1-cube-v0
g1-bookshelf_simple-v0
g1-bookshelf_hard-v0
g1-basketball-v0
g1-window-v0
g1-spoon-v0
g1-kitchen-v0
g1-package-v0
g1-powerlift-v0
g1-room-v0
g1-stand-v0
g1-run-v0
g1-sit_simple-v0
g1-sit_hard-v0
g1-balance_simple-v0
g1-balance_hard-v0
g1-stair-v0
g1-slide-v0
g1-pole-v0
g1-insert_normal-v0
g1-insert_small-v0
# Define TASK
export TASK="h1hand-sit_simple-v0"
# Train TD-MPC2
python -m tdmpc2.train disable_wandb=False wandb_entity=[WANDB_ENTITY] exp_name=tdmpc task=humanoid_${TASK} seed=0
# Train DreamerV3
python -m embodied.agents.dreamerv3.train --configs humanoid_benchmark --run.wandb True --run.wandb_entity [WANDB_ENTITY] --method dreamer --logdir logs --task humanoid_${TASK} --seed 0
# Train SAC
python ./jaxrl_m/examples/mujoco/run_mujoco_sac.py --env_name ${TASK} --wandb_entity [WANDB_ENTITY] --seed 0
# Train PPO (not using MJX)
python ./ppo/run_sb3_ppo.py --env_name ${TASK} --wandb_entity [WANDB_ENTITY] --seed 0
# Define TASK
export TASK="h1hand-push-v0"
# Define checkpoints to pre-trained low-level policy and obs normalization
export POLICY_PATH="data/reach_one_hand/torch_model.pt"
export MEAN_PATH="data/reach_one_hand/mean.npy"
export VAR_PATH="data/reach_one_hand/var.npy"
# Train TD-MPC2 with pre-trained low-level policy
python -m tdmpc2.train disable_wandb=False wandb_entity=[WANDB_ENTITY] exp_name=tdmpc task=humanoid_${TASK} seed=0 policy_path=${POLICY_PATH} mean_path=${MEAN_PATH} var_path=${VAR_PATH} policy_type="reach_single"
# Train DreamerV3 with pre-trained low-level policy
python -m embodied.agents.dreamerv3.train --configs humanoid_benchmark --run.wandb True --run.wandb_entity [WANDB_ENTITY] --method dreamer_${TASK}_hierarchical --logdir logs --env.humanoid.policy_path ${POLICY_PATH} --env.humanoid.mean_path ${MEAN_PATH} --env.humanoid.var_path ${VAR_PATH} --env.humanoid.policy_type="reach_single" --task humanoid_${TASK} --seed 0
Please find here json files including all the training curves, so that comparing with our baselines will not necessarily require re-running them in the future.
The json files follow this key structure: task -> method -> seed_X -> (million_steps or return). As an example to access the return sequence for one seed of the SAC run for the walk task, you can query the json data as data['walk']['SAC']['seed_0']['return']
.
If you find HumanoidBench useful for your research, please cite this work:
@article{sferrazza2024humanoidbench,
title={HumanoidBench: Simulated Humanoid Benchmark for Whole-Body Locomotion and Manipulation},
author={Carmelo Sferrazza and Dun-Ming Huang and Xingyu Lin and Youngwoon Lee and Pieter Abbeel},
journal={arXiv Preprint arxiv:2403.10506},
year={2024}
}
This codebase contains some files adapted from other sources: