Closed parasnaren closed 4 years ago
Fix for the issue #26 with regards to the formula for calculating TD error for PER.
delta(j) = Reward(j) + gamma(j) * Q_target(S_j, arg max_a Q(S_j, a)) - Q(S_j-1, A_j-1)
Here, the Q(S_j, a) is the Q value predict by the model on the new state and not the old state.
Q(S_j, a)
Fix for the issue #26 with regards to the formula for calculating TD error for PER.
delta(j) = Reward(j) + gamma(j) * Q_target(S_j, arg max_a Q(S_j, a)) - Q(S_j-1, A_j-1)
Here, the
Q(S_j, a)
is the Q value predict by the model on the new state and not the old state.