Open lifan-yuan opened 7 months ago
Describe the solution you'd like
KTO has demonstrated superior performance to DPO in reasoning and is a great alternative to DPO (https://arxiv.org/abs/2404.02078). Can you please support the KTO implementation given its popularity?
Additional context
Original Non-NeMo implementation: https://github.com/ContextualAI/HALOs/blob/c69d008b0b724d0ef8f46a86bc405db1a2514d8e/trainers.py#L790
WIP in #78
cc @ertkonuk
When will this feature be merged into the main branch?
Describe the solution you'd like
KTO has demonstrated superior performance to DPO in reasoning and is a great alternative to DPO (https://arxiv.org/abs/2404.02078). Can you please support the KTO implementation given its popularity?
Additional context
Original Non-NeMo implementation: https://github.com/ContextualAI/HALOs/blob/c69d008b0b724d0ef8f46a86bc405db1a2514d8e/trainers.py#L790