In chapter08,
Implementing the n-step DQN, the value default value of gamma is 0.96059601 (0.99 ** 4). The same value is passed calc_loss_dqn function.
If we use steps_count = 4, shouldn't the bellman update (after unrolling) be something like:
bellman_vals = next_state_vals.detach() * (gamma ** 4) + 3rd_next_state_vals.detach() * (gamma ** 3) + 2nd... + rewards_v
but it is instead, as per the function cacl_loss_dqn in common.py module:
In Chapter07, it is stated "there is an implementation of subtrajectory rollouts with accumulation of the reward.". But is this accumulation of reward discounted? I don't think so, since no parameter of gamma is passed
In chapter08, Implementing the n-step DQN, the value default value of gamma is 0.96059601 (0.99 ** 4). The same value is passed calc_loss_dqn function.
If we use steps_count = 4, shouldn't the bellman update (after unrolling) be something like:
bellman_vals = next_state_vals.detach() * (gamma ** 4) + 3rd_next_state_vals.detach() * (gamma ** 3) + 2nd... + rewards_v
but it is instead, as per the function cacl_loss_dqn in common.py module:bellman_vals = next_state_vals.detach() * gamma + rewards_v
In Chapter07, it is stated "there is an implementation of subtrajectory rollouts with accumulation of the reward.". But is this accumulation of reward discounted? I don't think so, since no parameter of gamma is passed