YeWR / EfficientZero

Open-source codebase for EfficientZero, from "Mastering Atari Games with Limited Data" at NeurIPS 2021.
GNU General Public License v3.0
871 stars 137 forks source link

Question: Why not reanalyze 100% policy targets? #14

Open Hwhitetooth opened 2 years ago

Hwhitetooth commented 2 years ago

Hi there,

First of all, great work and thank you for opensourcing your code!

I have a question regarding reanalyze: you chose to reanalyze 99% of policy targets and 100% of value targets. I am just curious about the reason behind this choice. Did you try reanalyzing 100% of the policy targets? Did it hurt the performance?

Thank you!

YeWR commented 2 years ago

A larger ratio of reanalyzing can make training more efficient.

Actually, there is no significant difference between 99% of reanalyzing targets and 100% of reanalyzing targets since 99% and 100% are close enough.

In DeepMind's paper MuZero Unplugged Online and Offline Reinforcement Learning by Planning with a Learned Model, they discussed the mechanism and efficiency of Reanalysing in detail. If you are interested, please refer to this work.

Hope this can help you:)