NVIDIA / NeMo-Aligner

Scalable toolkit for efficient model alignment
Apache License 2.0
603 stars 74 forks source link

Can you support KTO? #143

Open lifan-yuan opened 7 months ago

lifan-yuan commented 7 months ago

Describe the solution you'd like

KTO has demonstrated superior performance to DPO in reasoning and is a great alternative to DPO (https://arxiv.org/abs/2404.02078). Can you please support the KTO implementation given its popularity?

Additional context

Original Non-NeMo implementation: https://github.com/ContextualAI/HALOs/blob/c69d008b0b724d0ef8f46a86bc405db1a2514d8e/trainers.py#L790

odelalleau commented 7 months ago

WIP in #78

cc @ertkonuk

Cppowboy commented 3 months ago

When will this feature be merged into the main branch?