More RL fine-tuning examples for LLMs (PPO & DPO)

YanSte / NLP-LLM-Fine-tuning-Llame-2-QLoRA-2024

Natural Language Processing (NLP) and Large Language Models (LLM) with Fine-Tuning LLM QLoRA and Llama 2 in 2024

6 stars 2 forks source link

Open JhonDan1999 opened 4 months ago

JhonDan1999 commented 4 months ago

Great repo! can you add example notebooks on using PPO and DPO for RL fine-tuning of LLMs with SFTs.

Thanks

YanSte commented 4 months ago

Thanks Yes, I will work on it soon. I'll ping you when it's done.