Ericonaldo / visual_wholebody

Train a loco-manipulation dog with RL
https://wholebody-b1.github.io/
Other
136 stars 13 forks source link

Reward sign #9

Closed 42jaylonw closed 4 months ago

42jaylonw commented 4 months ago

For tracking_contacts_shaped_force and tracking_contacts_shaped_vel reward scale, shouldn't they be positive?

https://github.com/Ericonaldo/visual_wholebody/blob/33fc1d90a1b5b10408f6814a020dc32ed4c99d13/low-level/legged_gym/envs/manip_loco/b1z1_config.py#L130

https://github.com/Ericonaldo/visual_wholebody/blob/33fc1d90a1b5b10408f6814a020dc32ed4c99d13/low-level/legged_gym/envs/manip_loco/b1z1_config.py#L131

Ericonaldo commented 4 months ago

If you take a close look at the reward function defined in the paper or the code, you will find that there will be less difference in the sign. image In short, C is a factor in [0,1] that denotes a phase. For our trotting case, that controls when two diagonal legs are up or down. For the first reward, it means that, when C is close to zero, encourage the force (contact the floor); for the velocity, vice versa. Therefore, changing the sign only changes the meaning to 'when C is close to zero, penalty the force (when C is close to one, there is less penalty)'. This is just a difference in penalty or reward (of course the phase represented by C is opposite to each other but it does not matter) and our penalty choice still helps in learning a phase behavior.

42jaylonw commented 4 months ago

I see, it probably only works when the gait phase can be inverted (50% as a cycle). It would be better to make it positive I think.

Ericonaldo commented 4 months ago

You can tune a better one for your own;)