OpenLLMAI / OpenRLHF

An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & Mixtral)
https://openrlhf.readthedocs.io/
Apache License 2.0
1.72k stars 161 forks source link

Support ORPO #253

Open paulcx opened 3 months ago

paulcx commented 3 months ago

Is it considered a new feature ORPO?

ORPO, a technique that replaces SFT+DPO/PPO was released recently. I saw @_philschmid's post regarding it yesterday. Gave ORPO a shot with phi-2 and @argilla_io dpo-mix-7k. Model: https://huggingface.co/abideen/phi2-pro. LazyORPO (Automated): https://colab.research.google.com/drive/19ci5XIcJDxDVPY2xC1ftZ5z1kc2ah_rx?usp=sharing

official repository: https://github.com/xfactlab/orpo

hijkzzz commented 3 months ago

Welcome everyone to contribute ORPO~ Our engineers have limited bandwidth at the moment~