Question about the effect of discount factor and done mask when calculating the target value?

YeWR / EfficientZero

Open-source codebase for EfficientZero, from "Mastering Atari Games with Limited Data" at NeurIPS 2021.

GNU General Public License v3.0

847 stars 131 forks source link

Question about the effect of discount factor and done mask when calculating the target value? #42

Open puyuan1996 opened 1 year ago

puyuan1996 commented 1 year ago

Thanks for your open-sourced code very much.

This is a common definition of an target value in classical RL:

I'm a little confused about the way of calculating target value here in reanalyze_worker.py:

Why we do not multiply the bootstrap value (here is value_lst) by the discount_factor^td_steps, and why we do not mask the bootsrap value when the target obs is a done state.

Looking forward to your reply！