Observations in Cartpole: two_poles

google-deepmind / dm_control

Google DeepMind's software stack for physics-based simulation and Reinforcement Learning environments, using MuJoCo.

Apache License 2.0

3.76k stars 666 forks source link

Observations in Cartpole: two_poles #102

Closed camontblanc closed 5 years ago

camontblanc commented 5 years ago

Hi!

According to the tech report, the Cart-k-pole has:

dim(S) = 2k+2
dim(A) = 1
dim(O) = 3k+2 My question is: what does the states and observations physically mean? Because I'm using the two_pole non-benchmarking task and I imagine the state corresponds to: position and velocity of the cart and the two poles (we have 6 as dim(S)). However, Why do we have 2 mores observations?

Thanks by the way!

alimuldal commented 5 years ago

That's because in the observations we represent the angle of each pole by its sine and cosine. These are more convenient as inputs to neural networks than the angle itself because they are bounded between -1 and 1, and there are no discontinuities when the pole spins > 360° in either direction.

Here are the relevant bits of code: https://github.com/deepmind/dm_control/blob/master/dm_control/suite/cartpole.py#L202-L207 https://github.com/deepmind/dm_control/blob/master/dm_control/suite/cartpole.py#L150-L153