Open jankrepl opened 3 years ago
I guess I found the answer myself:
shuffle observation
- the order of features is permuted, however, the hidden states are not reset and therefore it takes some time to for the algorithm to adjustrestart environment
- the hidden states are reset to 0
First of all, great job on the paper, this visualization page and also the source code!
I was playing around with your interactive visualization of the CartPole task and I don't really understand why clicking the
shuffle observations
button should have any effect on the cart pole. (I am referring to the case where there was no additional noise). Clicking the button clearly disrupts the performance.AFAIK, the policy network should be permutation invariant and the latent code
m_t
should not change if we shuffle observed features. My only guess is that you restart theself.hx
hidden state of theLSTMCell
, however, what is the difference to therestart environment
button?Thank you in advance for you response!