google-deepmind / dm_control

Google DeepMind's software stack for physics-based simulation and Reinforcement Learning environments, using MuJoCo.
Apache License 2.0
3.8k stars 668 forks source link

Humanoid_CMU stand, walk run with 2020 parameter values #334

Open dyth opened 2 years ago

dyth commented 2 years ago

How could I use the 2020 humanoid_CMU torques, kp and damping values for the Humanoid_CMU stand, walk and run tasks?

Is there any way to do this that wouldn't require me manually changing the parameters in humanoid_CMU.xml?

saran-t commented 2 years ago

Sorry for late response. Can you explain why you want to do this? The 2020 version of the CMU Humanoid is only intended for use with the motion capture dataset for the Catch & Carry SIGGRAPH paper.

dyth commented 2 years ago

I was hoping to do RL from scratch of a 56D humanoid from states. I saw that the 2019 parameters weren't intended for training from scratch, so I thought I'd use the 2020 parameters instead, having thought that the 2020 version was used in RL from scratch with humanoid-gaps and humanoid-parkour with V-MPO: Figure 4b, Page 8 and Sampled MuZero: Figure 6, Page 8. The stand, walk and run tasks seemed simpler than the gaps and parkour tasks and I was thus hoping to test my current algorithm on those tasks first.

saran-t commented 2 years ago

@yuvaltassa's comment about difficulty in training CMU humanoid using RL from scratch applies to all variations of the model. The point is that even if you can get high RL reward by training from scratch with this model, you'd probably end up with strange looking gaits (if you can consider them gaits at all).

I'd say if you're interested in training stand/walk/run from state then just use the stock dm_control.suite task. The main difference, however, is that the "v2019" and "v2020" CMU humanoid expose rescaled actuators, so that the agents send a control signals in the range [-1, 1] and the walker automatically scales up to the full actuation range. In the Control Suite, this isn't done for you, so you might need to put an action preprocessor in between your agent and the environment.

dyth commented 2 years ago

Thanks for the information -- is this action preprocessor different from action normalization (ie just linearly rescaling the action space to [-1, 1])?