Stable-Baselines-Team / stable-baselines3-contrib

Contrib package for Stable-Baselines3 - Experimental reinforcement learning (RL) code
https://sb3-contrib.readthedocs.io
MIT License
465 stars 173 forks source link

[Feature Request] Expand RNN Options and Algorithm Flexibility #220

Open mtnusf97 opened 9 months ago

mtnusf97 commented 9 months ago

🚀 Feature

I suggest expanding the system's recurrent components by introducing various recurrent neural networks (RNNs) like vanilla RNN, GRU, and maybe some lesser-know networks like LMU, and ctRNN. Additionally, I propose compatibility with other RL algorithms beyond PPO, specifically A2C.

Motivation

The motivation is to enhance flexibility, allowing users to choose from a diverse set of recurrent networks and RL algorithms.

Pitch

Introduce different recurrent net options for different RL algorithms such as A2C, providing users with a more comprehensive toolkit for designing and experimenting RL with recurrent components.

Alternatives

Focus on LstmPPO: While effective, this limits exploration and potentially misses out on the strengths of other RNNs.

Develop custom algorithms: This is resource-intensive and may not be as widely applicable as expanding existing options.

Additional context

I have already implemented most of these features in my personal repository and successfully utilized them in my research.

Checklist

masterdezign commented 9 months ago

Hi @mtnusf97, I am working on #201 so I may add several types of recurrent networks to SAC.

araffin commented 8 months ago

I propose compatibility with other RL algorithms beyond PPO, specifically A2C.

A2C is already included by the recurrent PPO implementation: https://arxiv.org/abs/2205.09123

introducing various recurrent neural networks (RNNs) like vanilla RNN, GRU, and maybe some lesser-know networks like LMU, and ctRNN.

I have already implemented most of these features in my personal repository and successfully utilized them in my research.

do you have a benchmark to share? and are you willing to implement and benchmark those alternatives? (I would start with GRU only at first) adding more options will add complexity to an already complex algorithm, so we should do that only if it is really beneficial.