Closed tomasruizt closed 5 years ago
@BarisYazici yes! The idea is to encourage the agent to stand still around the target joint_angles. For that we introduce a _GOAL_JOINT_VEL = [0, 0, 0]
, as objective on top of the goal_joint_angle and augment the reward with a
joint_vel_penalty = _l2_distance(current_state.joint_vel, goal_state.joint_vel). The meat of this change is in the
compute_reward()` method.
I think we will have a couple of approaches to the reward function. Can we make the reward interchangeable?
I like that idea a lot. What do you propose? Maybe Flags in the constructor? MsjEnv(...., joint_vel_penalty=True, tendon_vel_penalty=True, quadratic_loss=True, ...)
@BarisYazici The joint_vel_penalty
can be turned on and off at construction time now.
Can you elaborate the objective of this pull request?