Closed MillionIntegrals closed 6 years ago
By commit 2dea2f65610776baa847c04753eae4557612ca26 both A2C and PPO policy gradient algorithms are implemented. I'll spend some time testing this and afterwards I'll implement ACKTR.
By commit cdd039eda056b706f49f4d45b2f8c65f0752099e very first DQN implementation seems to be working. I'll spend some more time debugging it and adding variants improving the performance.
By commit 37de03ad74aa64fa9f02874f86409e6be0390472 DQN seems to be stable and working, although seems to be quite slow. I've implemented double DQN, Dueling DQN and Prioritized Experience Replay.
By commit d44423b15223ced77fa00f04163b6b990f489ce6 I've implemented an off-policy policy gradient algorithms ACER. Still needs debugging to be sure that implementation is correct. I'm still missing trust region implementation from the paper, so there is still no feature parity with OpenAI baselines in that respect.
By commit 3535ad054e1b4f043f1f530565a4216d3469c3f9 I've implemented Trust Region Policy Optimization algorithm. Some debugging is still needed, however I'm getting closer to the final goal for this ticket.
What still needs to be done before this ticket is closed:
My goal is to have that done by the end of this month.
By commit 534766927bdde115094653344b1c9b083fc6ab32 I've implemented Deep Deterministic Policy Gradient algorithm. That is the last one I've wanted to implement in this batch. I'll do a bit more of debugging to make sure that all algorithms work correctly and close this ticket as finished.
All algorithms have been debugged and tested to work properly.
My next step is to have clean working and benchmarked policy gradient reinforcement learning algorithms.