Support for imitation learning

facebookresearch / Pearl

A Production-ready Reinforcement Learning AI Agent Library brought by the Applied Reinforcement Learning team at Meta.

MIT License

2.6k stars 157 forks source link

Support for imitation learning #90

Closed e-zorzi closed 4 months ago

e-zorzi commented 4 months ago

Hi, thank you for your work; it seems like an interesting RL library.

I might be interested in using it in my next research project, a kind of imitation learning-based RL agent. I have seen that you do not explicitly have a module that can deal with pre-collected offline data, used to augment the agent's learning and acting part. Do you plan to do it in the future? For example, I'd like to take the Q-learning agent and an offline dataset collected by another expert agent (outside the loop) and improve its learning using this information. Is it easy enough to do it in the current framework? Like sub-classing the Q-agent with a custom implementation, passing to it the data in the init function, and then changing its learn/act methods?

Thank you very much! Cheers

rodrigodesalvobraz commented 4 months ago

Hi Edoardo, thanks for getting in touch. We do have offline learning methods. You can find an example in this tutorial: https://github.com/facebookresearch/Pearl/blob/main/tutorials/sequential_decision_making/Implicit_Q_learning.ipynb Does that satisfy your needs?

e-zorzi commented 4 months ago

Hi Rodrigo, Thanks for the answer. Yes, it might work. I see that it calls the safety module, which might be useful for my use case.

One question, talking about offline RL: do you plan to support, in the interface, sequence modeling algorithms, such as the Decision/Trajectory transformer, and (semi-supervised) pre-trainining + fine-tuning workflows? I'm looking at the code trying to understand how easy would it be to implement it (and how modular could it be), but I guess it's quite a lot of work on the offline side. Cheers

rodrigodesalvobraz commented 4 months ago

Hi Edoardo. We are not explicitly planning for Decision/Trajectory transformer, although we may add more methods as the need arises. Our current priority is polishing Pearl for basic features, robustness, and ease of use. Thank you! Rodrigo