Implementation of the Prioritized Experience Replay for the Deep Q-Network algorithm, following the publication from Tom Schaul, John Quan, Ioannis Antonoglou and David Silver : https://arxiv.org/pdf/1511.05952.pdf
Dependencies:
This implementation refers to the Rank-based prioritization mentionned in the paper rather than the Proportional prioritization. This means that the priorities associated with each environment state are stored in a conventional container (Here a dictionnary) rather than in a sum tree.
This implementation with PER only can solve the lunar-lander environment in about 1200 episodes. It could be further improved by adding a the dueling Q-Network implementation. The computation of the weights necessary for the dueling Q-Network in combination with PER are already implemented here.
To run the training and observed the trained agent : python launch.py
To be able to visualize the agents from WSL (Windows Sub-sysytem for Linux) or WSL 2:
sudo apt install ubuntu-desktop mesa-utils
Extra Settings
, un-tick Native OpenGL and tick Disable access control.export DISPLAY=localhost:0