Lightning-Universe / lightning-bolts

Toolbox of models, callbacks, and datasets for AI/ML researchers.
https://lightning-bolts.readthedocs.io
Apache License 2.0
1.68k stars 320 forks source link

Add A2C, ACER, and TRPO for Reinforcement Learning #596

Closed blahBlahhhJ closed 11 months ago

blahBlahhhJ commented 3 years ago

🚀 Feature

Implementation of more RL actor-critic based algorithms (models) like A2C, ACER, and TRPO.

Motivation

The RL section in this project has very few popular algorithms, especially lacking many policy-based and actor-critic-based algorithms. The only policy-based algorithms available now are policy gradient and REINFORCE, which are very old algorithms, and this was also pointed out by #186 . I would like to contribute to the RL section by adding more modern RL algorithms like Advantage Actor Critic (A2C), Soft Actor Critic (SAC), (Actor Critic with Experience Replay) ACER, and (Trust Region Policy Optimization) TRPO.

Pitch

Will implement various RL algorithms for easier and more convenient experiments. Implementation will follow about the same structure as policy gradient, and would add a new Agent class (actor critic agent), and everything else will be added within each specific new algorithms.

Additional context

If no strong preference on which algorithm to start, I will first work on the A2C algorithm, which is the simplest one - the code will be easiest for people to understand comparing to the other two more sophisticated methods. The reason I chose A2C is because it's the base of the actor-critic methods, and would definitely worth to be in this project. The other two are built upon this method with better convergence properties and better sample efficiencies.

A2C/A3C: https://arxiv.org/abs/1602.01783 SAC: https://arxiv.org/abs/1801.01290 TRPO: https://arxiv.org/abs/1502.05477 ACER: https://arxiv.org/abs/1611.01224

github-actions[bot] commented 3 years ago

Hi! thanks for your contribution!, great first issue!

akihironitta commented 3 years ago

@blahBlahhhJ Hi, thank you for suggesting your ideas! Feel free to submit PRs!

plutasnyy commented 2 years ago

Hello! I would like to ask @blahBlahhhJ if you have already started working on for example TRPO? If not, then @NaIwo and I would be happy to try to work on it. Let us know!

blahBlahhhJ commented 2 years ago

Hello! I would like to ask @blahBlahhhJ if you have already started working on for example TRPO? If not, then @NaIwo and I would be happy to try to work on it. Let us know!

Hi. Kind of busy right now, so no plans to implement TRPO for me. Feel free to work on it, I guess you can simply submit a draft pull request following the format and mention this issue and you should be good to go. Good luck coding!