Question about the effect of state encoding indentity connection in dynamics network

YeWR / EfficientZero

Open-source codebase for EfficientZero, from "Mastering Atari Games with Limited Data" at NeurIPS 2021.

GNU General Public License v3.0

847 stars 131 forks source link

Thank you for your comments.

The identity connection here follows the same architecture of resnets. The residual part provides richer and better gradients when the network is deep. Considering the dynamics network unrolls 5 steps recurrently, the gradient flow of the final unroll can be much deeper (over 10 layers). Consequently, we add the identity connection here.

As for empirical results, we find that such an identity connection shapes a better reward model. We collect some datasets and try to predict the reward through supervised learning for these data. We find that the model with the identity connection has a lower test error of the reward prediction.

Hope this address your concerns.

YeWR / EfficientZero

Question about the effect of state encoding indentity connection in dynamics network #35