Open Div99 opened 1 year ago
Agreed! This is on our to-do list too. If you are interested and have time, you can contribute..
Hi, Probably somewhat related to this, having a forced_decoder_ids argument in the policy.generate() function might help with the offline RL setting, so is there a specific reason to not have that in the current generate function under hf_generation_utils.py? Also, is there a plan to add it in the near future?
@ghadiaravi13 I think this is probably because of the transformers library version that we adapted to hf_generation_utils.py. Once we upgrade it to recent versions, we can support this.
Got it, thanks!
Hi @rajcscw,
Any update on this issue?
I'm wondering if Q-Learning methods can work for LLM training 🤔 Would be extremely grateful if you can share your experience on this.
Hi, first of all, great work. This is a very useful library for research on RL and NLP. It will be very helpful if it's possible to add off-policy RL methods like Q-learning, SAC, etc. along with benchmarks.
Also, new offline RL methods applied to NLP like ILQL can be very interesting for human alignment, and support for such methods will further enhance the value of this codebase.