Implement policy gradient reinforcement learning algorithms

MillionIntegrals / vel

Velocity in deep-learning research

MIT License

276 stars 33 forks source link

Implement policy gradient reinforcement learning algorithms #1

Closed MillionIntegrals closed 6 years ago

MillionIntegrals commented 6 years ago

My next step is to have clean working and benchmarked policy gradient reinforcement learning algorithms.

MillionIntegrals commented 6 years ago

By commit 2dea2f65610776baa847c04753eae4557612ca26 both A2C and PPO policy gradient algorithms are implemented. I'll spend some time testing this and afterwards I'll implement ACKTR.

MillionIntegrals commented 6 years ago

By commit cdd039eda056b706f49f4d45b2f8c65f0752099e very first DQN implementation seems to be working. I'll spend some more time debugging it and adding variants improving the performance.

MillionIntegrals commented 6 years ago

By commit 37de03ad74aa64fa9f02874f86409e6be0390472 DQN seems to be stable and working, although seems to be quite slow. I've implemented double DQN, Dueling DQN and Prioritized Experience Replay.

MillionIntegrals commented 6 years ago

By commit d44423b15223ced77fa00f04163b6b990f489ce6 I've implemented an off-policy policy gradient algorithms ACER. Still needs debugging to be sure that implementation is correct. I'm still missing trust region implementation from the paper, so there is still no feature parity with OpenAI baselines in that respect.

MillionIntegrals commented 6 years ago

By commit 3535ad054e1b4f043f1f530565a4216d3469c3f9 I've implemented Trust Region Policy Optimization algorithm. Some debugging is still needed, however I'm getting closer to the final goal for this ticket.

What still needs to be done before this ticket is closed:

1st order trust-region implementation for ACER
Deep Deterministic Policy Gradients (DDPG)
Significant amount of debugging these algorithms on gym environments to make sure my implementations are correct.

My goal is to have that done by the end of this month.

MillionIntegrals commented 6 years ago

By commit 534766927bdde115094653344b1c9b083fc6ab32 I've implemented Deep Deterministic Policy Gradient algorithm. That is the last one I've wanted to implement in this batch. I'll do a bit more of debugging to make sure that all algorithms work correctly and close this ticket as finished.

MillionIntegrals commented 6 years ago

All algorithms have been debugged and tested to work properly.