[Clarification] Action and observation definition.

We should have mentioned the definition of action and observation of our RL problem in the description (it will be updated soon) although this is not very relevant to the exercises of A2.

The action $\mathbf{a}$ is joint angle target we send to our characters. This target is sent to PD controller (which is already implemented in our simulator) of the characters and converted into torque command (TMI: technically this is much complicated than that, but for now, you can just consider we have PD controller which converts joint angle targets into joint torque)

The dog character has 12 joint, and humanoid character has 43 joints, and we normalize the joint angle target into the range of $[-1, +1]$ where -1 is map to minimum joint angle and +1 is map to maximum joint angle.

The observation $\mathbf{o}$ is slightly more complicated, but you can consider the observation of the RL agent is defined as a concatenated vector of generalized coordinates and generalized velocity of the characters excepting (x, z) coordinates of base position and yaw angle of base orientation.

Quiz: can you guess why we except (x, z) and yaw angle of base orientation from the observation? :)

Digital-Humans-23 / a2

[Clarification] Action and observation definition. #1