Providing policy parameters or learning curves

carlosferrazza / humanoid-bench

Other

356 stars 37 forks source link

Hi, first of all thank you very much for such relevant contribution to the community.

I am interested in building on top of this work, and thus wanted to ask if it would be possible to get access to:

the checkpoints of all the trained policies presented in the paper. At the moment, only the one and two hand reacher policy checkpoints are available.
the learning curves used to provide the plots in the paper.

Having access to these would make it easier for researchers to build on top of this benchmark, as at the moment one has to spend significant computational resources to retrain all policies in the paper to even have a baseline to build on top of. It would also be useful to hear from you expected wall-clock times for training these policies on a desktop GPU.

Thanks!

Hi, and thanks for the great suggestion!

We just uploaded json files including all the runs, so that comparing with our baselines will not necessarily require re-running those in the future. You will find them here.

The json files follow this key structure: task -> method -> seed_X -> (million_steps or return). As an example to access the return sequence for one seed of the SAC run for the walk task, you can query the json data as data['walk']['SAC']['seed_0']['return'].

As for the wall-clock times, we could train DreamerV3 for 10M steps and TD-MPC2 for 2M steps in 48 hours on a single NVIDIA Quadro RTX 6000 or A5000.

Hope this helps!

carlosferrazza / humanoid-bench

Providing policy parameters or learning curves #2