Open alexandremuzio opened 1 year ago
Hey sure this sounds interesting. If you get this working well, I think I would be open to collaborating on a joint blog post. Including a new training paradigm in trlX is substantial effort on our part. If there is enough demand for it (which I think could potentially be rallied by a blog post) we would certainly be open to considering it :)
Can you send me an email? louis@stability.ai
That's great! I've sent you an email and I can update this thread here with next steps etc
🚀 The feature, motivation, and pitch
I've been working on RLHF for a while and have been exploring the use of Minimum Risk Training (paper: here with further investigations here) for improving encoder-decoder translation models by RL finetuning. It's an interesting procedure that is a lot simpler to PPO but seems to be a lot more stable for the translation setup I've been working with.
I'm wondering if integrating this new training procedure would be of interest to anyone and if so I could work on adding this.
For my experiments, I've been using
MarianMT
huggingface enc/dec models (which could also be integrated) but should also work with the currently supported T5 models as well and possibly LMs as well.Alternatives
No response
Additional context
No response