KTO (unpaired) Support - Githubissues

⚠️ Please check that this feature request hasn't been suggested before.

[X] I searched previous Ideas in Discussions didn't find any similar feature requests.
[X] I searched previous Issues didn't find any similar feature requests.

🔖 Feature description

The original HALOs repo supports the full version of KTO that allows for imbalanced data and more stable training.

See trainer here: https://github.com/ContextualAI/HALOs/blob/6333a8f03c5c12c0a0b791e083904eda47a5b96c/trainers.py#L758

This library currently only supports KTO training on pairs using the less stable version of the loss, called SimpleKTO in the original repo.

✔️ Solution

The KTOTrainer in the original repo with the full loss, as described in the attached image (note that there is no backpropagation through the KL term; it is used only for saturation):

❓ Alternatives

No response

📝 Additional Context