Open JhonDan1999 opened 4 months ago
Great repo! can you add example notebooks on using PPO and DPO for RL fine-tuning of LLMs with SFTs.
Thanks
Thanks Yes, I will work on it soon. I'll ping you when it's done.
Great repo! can you add example notebooks on using PPO and DPO for RL fine-tuning of LLMs with SFTs.
Thanks