Improbable-AI / walk-these-ways

Sim-to-real RL training and deployment tools for the Unitree Go1 robot.
https://gmargo11.github.io/walk-these-ways/
Other
492 stars 129 forks source link

Reward functions #11

Closed yihangyao closed 1 year ago

yihangyao commented 1 year ago

Hi, thank you for the great work.

I encounter some problems while using this repo. I am wondering whether I got the right way to compute the reward. I found that in the training process, the reward is calculated by these 18 functions: 'tracking_lin_vel', 'tracking_ang_vel', 'lin_vel_z', 'ang_vel_xy', 'orientation', 'torques', 'dof_vel', 'dof_acc', 'collision', 'action_rate', 'tracking_contacts_shaped_force', 'tracking_contacts_shaped_vel', 'jump', 'dof_pos_limits', 'feet_slip', 'dof_pos', 'action_smoothness_1', 'action_smoothness_2'. However, some functions described in the paper such as 'raibert_heuristic_footswing_tracking' is not used in the reward calculation (although it exists in the CoRLReward file).

Could you please help clarify this? Did I miss some points?

gmargo11 commented 1 year ago

Hi @LiousYao,

Thanks for trying out the code. I'm guessing you might be looking at the computed reward when running test.py? That file uses the default configuration, which has the terms you listed above. However, in the example training script, and for the pretrained model, the coefficients are set as in the paper; see https://github.com/Improbable-AI/walk-these-ways/blob/master/scripts/train.py#L117 . When those configuration parameters are assigned as in train.py before stepping the environment, they should activate the raibert heuristic and footswing tracking rewards.

Gabe