PacktPublishing / Deep-Reinforcement-Learning-Hands-On-Second-Edition

Deep-Reinforcement-Learning-Hands-On-Second-Edition, published by Packt
MIT License
1.17k stars 545 forks source link

Value of Discount Factor in ch8/02_dqn_n_steps.py #36

Closed Hrushikesh-github closed 3 years ago

Hrushikesh-github commented 3 years ago

In chapter08, Implementing the n-step DQN, the value default value of gamma is 0.96059601 (0.99 ** 4). The same value is passed calc_loss_dqn function.

If we use steps_count = 4, shouldn't the bellman update (after unrolling) be something like:

bellman_vals = next_state_vals.detach() * (gamma ** 4) + 3rd_next_state_vals.detach() * (gamma ** 3) + 2nd... + rewards_v but it is instead, as per the function cacl_loss_dqn in common.py module:

bellman_vals = next_state_vals.detach() * gamma + rewards_v

In Chapter07, it is stated "there is an implementation of subtrajectory rollouts with accumulation of the reward.". But is this accumulation of reward discounted? I don't think so, since no parameter of gamma is passed