Closed 42jaylonw closed 4 months ago
If you take a close look at the reward function defined in the paper or the code, you will find that there will be less difference in the sign. In short, C is a factor in [0,1] that denotes a phase. For our trotting case, that controls when two diagonal legs are up or down. For the first reward, it means that, when C is close to zero, encourage the force (contact the floor); for the velocity, vice versa. Therefore, changing the sign only changes the meaning to 'when C is close to zero, penalty the force (when C is close to one, there is less penalty)'. This is just a difference in penalty or reward (of course the phase represented by C is opposite to each other but it does not matter) and our penalty choice still helps in learning a phase behavior.
I see, it probably only works when the gait phase can be inverted (50% as a cycle). It would be better to make it positive I think.
You can tune a better one for your own;)
For tracking_contacts_shaped_force and tracking_contacts_shaped_vel reward scale, shouldn't they be positive?
https://github.com/Ericonaldo/visual_wholebody/blob/33fc1d90a1b5b10408f6814a020dc32ed4c99d13/low-level/legged_gym/envs/manip_loco/b1z1_config.py#L130
https://github.com/Ericonaldo/visual_wholebody/blob/33fc1d90a1b5b10408f6814a020dc32ed4c99d13/low-level/legged_gym/envs/manip_loco/b1z1_config.py#L131