Stable-Baselines-Team / stable-baselines3-contrib

Contrib package for Stable-Baselines3 - Experimental reinforcement learning (RL) code
https://sb3-contrib.readthedocs.io
MIT License
465 stars 173 forks source link

[Question] Recurrent Maskable PPO ?!? Rudder ?!? #241

Closed tty666 closed 5 months ago

tty666 commented 5 months ago

❓ Question

Hello, I am making a lot of test for finRL and using LSTM/GRU into an ActorCritic Policy was one of my first idea but I saw now that you have "maskable" environment. And in fact if you work on opening trades/closing trades + or multidiscrete actions (% of invest, possible leverage, % stop loss, ...) it makes sense also to mask actions for opening a trade when a trade is already open as an example ... So I would like to know if for you it would be possible to mix Recurrent PPO and Maskable PPO ? I am not asking for a feature but more on your expertise about the feasibility of mixing those two particular PPO implementation ? Also I saw an article about "Rudder" for delayed reward on RL and maybe we could see it implemented also in stable-baselines3 (based also on LSTM for the delayed reward) ? https://ml-jku.github.io/rudder/ Thanks in advance for your answer guys !

(And yes I know about the risk on Financial algorithm - gmabling aspect but it doesn't mean it's not interesting !)

Checklist

araffin commented 5 months ago

duplicate of https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/issues/101