Closed rrfaria closed 3 years ago
Custom networks like these would be nice to experiment with, but won't be considered before first releases. This sounds like a good addition to a contrib repo we have been considering.
I totally agree, the place for such network is the contrib repo ;)
Would implementations of other not-so-mainstream algorithms (such as MPO or AWAC) also go to the contrib repo ?
Would implementations of other not-so-mainstream algorithms (such as MPO or AWAC) also go to the contrib repo ?
MPO would be a good fit (and it is quite complex), @Miffyli is currently writing a contribution guide for the contrib repo, so we keep it clean and functional.
for AWAC, I plan to write some wrapper to use https://github.com/takuseno/d3rlpy repo (which has a nice interface and other offline RL implementations) see https://github.com/takuseno/d3rlpy/issues/5
Closing this issue as the contrib repo is now live: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib (I also created an issue for MPO there)
A little explannation about what is transformer: https://en.wikipedia.org/wiki/Transformer_(machine_learning_model)#:~:text=The%20Transformer%20is%20a%20deep,as%20translation%20and%20text%20summarization.
You can find an example on RLLIB
https://docs.ray.io/en/latest/rllib-models.html#attention-networks https://github.com/ray-project/ray/blob/master/rllib/examples/attention_net.py
The paper about it: https://arxiv.org/pdf/1910.06764.pdf
I would like to help you implement this but I don't have knowledge enough to help yet
What I know about it It is used more in NLP but some researchers started to use it in different field such RL. Some article saying it performs better than lstm.