This is a common definition of an target value in classical RL:
I'm a little confused about the way of calculating target value here in reanalyze_worker.py:
Why we do not multiply the bootstrap value (here is value_lst) by the discount_factor^td_steps, and why we do not mask the bootsrap value when the target obs is a done state.
Thanks for your open-sourced code very much.
This is a common definition of an target value in classical RL:
I'm a little confused about the way of calculating target value here in reanalyze_worker.py:
Why we do not multiply the bootstrap value (here is
value_lst
) by thediscount_factor^td_steps
, and why we do not mask the bootsrap value when the target obs is a done state.Looking forward to your reply!