YeWR / EfficientZero

Open-source codebase for EfficientZero, from "Mastering Atari Games with Limited Data" at NeurIPS 2021.
GNU General Public License v3.0
847 stars 131 forks source link

Question about the effect of state encoding indentity connection in dynamics network #35

Open puyuan1996 opened 1 year ago

puyuan1996 commented 1 year ago

Thanks for your open-sourced code very much.

I'm a little confused about the reason for the identity connection of state encoding in DynamicsNetwork in model.py:

Why do we add this state encoding identity connection, rather than using action encoding, and what is its empirical impact on atari results?

Looking forward to your reply!

YeWR commented 1 year ago

Thank you for your comments.

The identity connection here follows the same architecture of resnets. The residual part provides richer and better gradients when the network is deep. Considering the dynamics network unrolls 5 steps recurrently, the gradient flow of the final unroll can be much deeper (over 10 layers). Consequently, we add the identity connection here.

As for empirical results, we find that such an identity connection shapes a better reward model. We collect some datasets and try to predict the reward through supervised learning for these data. We find that the model with the identity connection has a lower test error of the reward prediction.

Hope this address your concerns.