attentionneuron / attentionneuron.github.io

Other
26 stars 5 forks source link

Shuffling observations should not have any effect #2

Open jankrepl opened 3 years ago

jankrepl commented 3 years ago

First of all, great job on the paper, this visualization page and also the source code!

I was playing around with your interactive visualization of the CartPole task and I don't really understand why clicking the shuffle observations button should have any effect on the cart pole. (I am referring to the case where there was no additional noise). Clicking the button clearly disrupts the performance.

AFAIK, the policy network should be permutation invariant and the latent code m_t should not change if we shuffle observed features. My only guess is that you restart the self.hx hidden state of the LSTMCell, however, what is the difference to the restart environment button?

Thank you in advance for you response!

jankrepl commented 2 years ago

I guess I found the answer myself: