DPO dataset format and loss

InternLM / xtuner

An efficient, flexible and full-featured toolkit for fine-tuning LLM (InternLM2, Llama3, Phi3, Qwen, Mistral, ...)

https://xtuner.readthedocs.io/zh-cn/latest/

Apache License 2.0

3.69k stars 298 forks source link

Open samedii opened 7 months ago

samedii commented 7 months ago

Should be quite easy to add for someone who knows the codebase. The biggest problem might be a new dataset format.

Don't expect I need to link this but it's pretty nice implementation of the loss: https://github.com/huggingface/trl/blob/main/trl/trainer/dpo_trainer.py#L817

LZHgrla commented 7 months ago

Hi @samedii Thanks very much! Implementing DPO requires the corresponding Model, Data, and Trainer (Loss).

This is a complex task, and we have built a team from our community to implement the RLHF feature for xtuner!

Please stay tuned~