[Question] Recurrent Maskable PPO ?!? Rudder ?!?

❓ Question

Hello, I am making a lot of test for finRL and using LSTM/GRU into an ActorCritic Policy was one of my first idea but I saw now that you have "maskable" environment. And in fact if you work on opening trades/closing trades + or multidiscrete actions (% of invest, possible leverage, % stop loss, ...) it makes sense also to mask actions for opening a trade when a trade is already open as an example ... So I would like to know if for you it would be possible to mix Recurrent PPO and Maskable PPO ? I am not asking for a feature but more on your expertise about the feasibility of mixing those two particular PPO implementation ? Also I saw an article about "Rudder" for delayed reward on RL and maybe we could see it implemented also in stable-baselines3 (based also on LSTM for the delayed reward) ? https://ml-jku.github.io/rudder/ Thanks in advance for your answer guys !

(And yes I know about the risk on Financial algorithm - gmabling aspect but it doesn't mean it's not interesting !)

Checklist

[X] I have checked that there is no similar issue in the repo
[X] I have read the documentation
[X] If code there is, it is minimal and working
[X] If code there is, it is formatted using the markdown code blocks for both code and stack traces.

Stable-Baselines-Team / stable-baselines3-contrib

[Question] Recurrent Maskable PPO ?!? Rudder ?!? #241

❓ Question

Checklist