DLR-RM / stable-baselines3

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
https://stable-baselines3.readthedocs.io
MIT License
9.26k stars 1.71k forks source link

[Feature Request] discrete soft actor-critic #505

Closed lingweizhu closed 3 years ago

lingweizhu commented 3 years ago

🚀 Feature

A discrete version of Soft actor-critic.

Motivation

I have been using SB3 quite heavily recently and found that there is no (correct me if I made a mistake) discrete off-policy actor-critic algorithm , which can serve as a prototype for implementing various algorithms of research interest.

I have been researching entropy-regularized RL algorithms and I believe with such a prototype many more algorithms could be developed by simply changing the Shannon entropy to something else (e.g. KL divergence or alpha divergence). For example. all of the algorithms (more than 10) in this paper could be implemented based on such prototype.

I also found that this repo has a good implementation and the author has mentioned the possibility of contributing #157 . I have also talked with its author @ku2482 who still thinks it is important and willing to contribute.

Pitch

A discrete version of soft actor-critic could serve as prototype for implementing various algorithms of research interest, especially policy iteration style algorithms.

Alternatives

I have thought about several alternatives but currently I have no better idea to circumvent the need for such a prototype entropy-regularized discrete off-policy actor-critic algorithm.

 Checklist

Miffyli commented 3 years ago

Hey. We would indeed be interested in having those algos! These less-known/newer algorithms should go to the contrib repository, to avoid bloating up / complicating this core code too much. If you or @ku2482 wants to implement discrete SAC, feel free to open a PR in that repo and we can discuss further :)

Edit: If discrete-actions support for SAC is trivial (i.e. easy to add along with continuous actions) and there is a good, established paper that details an implementation, we may consider adding it here to the main repository, but I feel it would change code all around quite a bit.

lingweizhu commented 3 years ago

Hi, thanks for the reply Miffyli!

I think there is indeed such a paper detailing the structure. It seems there are no big changes from the original SAC except discrete action support. But after having tinkered with the code a little bit myself , I don't have a clear idea whether it would be trivial to be integrated into SB3 .

But I think it would definitely provide a lot of ease for implementing the recently popular algorithms such as all of the algorithms in this paper.

araffin commented 3 years ago

Hello, as mentioned already in https://github.com/DLR-RM/stable-baselines3/issues/157, I would prefer it to be in contrib first. Please open an issue there and make sure to read the contributing guide of SB3 contrib if you want to implement the algorithm and submit a PR ;)

araffin commented 3 years ago

Closing this one as it belongs to SB3 contrib (please open one there).