Shmuma / ptan

PyTorch Agent Net: reinforcement learning toolkit for pytorch
MIT License
530 stars 164 forks source link

Weights should not affect probabilities in PrioReplayBuffer #36

Open dimaga opened 4 years ago

dimaga commented 4 years ago

In samples/rainbow/05_dqn_prio_replay.py, weights are propagated to batch_weights_v and multiplied by (state_action_values - expected_state_action_values) ** 2 to calculate losses_v.

(losses_v + 1e-5) is then used to calculate probabilites.

However, according to https://arxiv.org/pdf/1511.05952.pdf (Priority Experience Replay article, see Algorithm 1), TD-error is used as priority before it is multiplied to a weight.

Is it a mistake?