Open Hwhitetooth opened 2 years ago
A larger ratio of reanalyzing can make training more efficient.
Actually, there is no significant difference between 99% of reanalyzing targets and 100% of reanalyzing targets since 99% and 100% are close enough.
In DeepMind's paper MuZero Unplugged Online and Offline Reinforcement Learning by Planning with a Learned Model, they discussed the mechanism and efficiency of Reanalysing in detail. If you are interested, please refer to this work.
Hope this can help you:)
Hi there,
First of all, great work and thank you for opensourcing your code!
I have a question regarding reanalyze: you chose to reanalyze 99% of policy targets and 100% of value targets. I am just curious about the reason behind this choice. Did you try reanalyzing 100% of the policy targets? Did it hurt the performance?
Thank you!