Shuffling observations should not have any effect

First of all, great job on the paper, this visualization page and also the source code!

I was playing around with your interactive visualization of the CartPole task and I don't really understand why clicking the shuffle observations button should have any effect on the cart pole. (I am referring to the case where there was no additional noise). Clicking the button clearly disrupts the performance.

AFAIK, the policy network should be permutation invariant and the latent code m_t should not change if we shuffle observed features. My only guess is that you restart the self.hx hidden state of the LSTMCell, however, what is the difference to the restart environment button?

Thank you in advance for you response!

attentionneuron / attentionneuron.github.io

Shuffling observations should not have any effect #2