Clarification about robot_obs in custom Gym environments

PierreExeter commented 4 years ago

Hello,

Thanks a lot for sharing this amazing code!

I am looking at the instructions for creating a custom environment here and I noticed that the robot observations are generated as space vectors (for example gripper_pos-torso_pos).

robot_obs = np.concatenate([gripper_pos-torso_pos, gripper_pos-self.target_pos, robot_right_joint_positions, gripper_orient, head_orient, forces]).ravel()

Why are the robot observations created this way and not as a list of joint angle positions and velocities, which seems more intuitive to me? Is it more effective for training perhaps?

Also, why is self.target_pos used in the robot observation? The robot state does not depend on the target position. Should not self.target_pos be used in the computation of the reward only?

This seems to be the case in most Assistive-gym environments so there must be a good reason but I can't find it.

Thanks

Zackory commented 4 years ago

Hi there,

Good question! The observation does include the robot's joint angles (for controllable actuators), which is stored in the variable robot_right_joint_positions in this case above. As for the other elements in the observation, they help the robot infer the state of the world a bit easier. Technically all the robot needs to know about is its joint angles and a target goal, but then that requires the control model (in this case a neural network) to also learn how to do forward kinematics to compute how far the robot's end effector is from some goal. Instead, we just directly compute the forward kinematics to get the robot gripper position/orientation and provide that to the model to make learning a bit easier. For a quick validation of this, you could train simple controllers with and without this information in the observation and check how long it takes for them to learn the task.

self.target_pos is important to give the robot as an observation since that its the target goal for the robot to reach towards. If that information is not given to the robot, then the robot has no idea where it should move towards. Of course the reward will also be based on this target goal, but the robot only gets 'rewards' during training in simulation, not during test time. In practice, it is quite common for robots to know where their target goal is (otherwise they would just be moving around randomly!). For example, if helping someone eat food, the robot should know where the person's mouth is at each time step, which can be provided by off-the-shelf face detection software.

Hope this helps!

PierreExeter commented 4 years ago

Thanks for the clarifications! I actually realised that Mujoco's reacher environment also uses the same forward kinematic trick. And yes obviously, the robot must know the target position to be able to learn... Thanks!

Healthcare-Robotics / assistive-gym

Clarification about robot_obs in custom Gym environments #4