carlosferrazza / humanoid-bench

Other
356 stars 37 forks source link

Providing policy parameters or learning curves #2

Closed feracero closed 5 months ago

feracero commented 6 months ago

Hi, first of all thank you very much for such relevant contribution to the community.

I am interested in building on top of this work, and thus wanted to ask if it would be possible to get access to:

Having access to these would make it easier for researchers to build on top of this benchmark, as at the moment one has to spend significant computational resources to retrain all policies in the paper to even have a baseline to build on top of. It would also be useful to hear from you expected wall-clock times for training these policies on a desktop GPU.

Thanks!

carlosferrazza commented 6 months ago

Hi, and thanks for the great suggestion!

We just uploaded json files including all the runs, so that comparing with our baselines will not necessarily require re-running those in the future. You will find them here.

The json files follow this key structure: task -> method -> seed_X -> (million_steps or return). As an example to access the return sequence for one seed of the SAC run for the walk task, you can query the json data as data['walk']['SAC']['seed_0']['return'].

As for the wall-clock times, we could train DreamerV3 for 10M steps and TD-MPC2 for 2M steps in 48 hours on a single NVIDIA Quadro RTX 6000 or A5000.

Hope this helps!