Referenced the implementation of HALOs, the KTO algorithm has been integrated into this branch. It supports both balanced (referred to as the vanilla version) and unbalanced (referred to as the non-vanilla version) scenarios for handling positive and negative samples in a batch. The vanilla version ensures that the number of positive and negative samples is consistent within each batch, while the non-vanilla version does not require this consistency.
A lightweight dataset was selected for algorithm validation, where the effects of DPO, vanilla KTO, non-vanilla KTO, and the baseline were compared. The dataset and the results are as follows:
Referenced the implementation of HALOs, the KTO algorithm has been integrated into this branch. It supports both balanced (referred to as the vanilla version) and unbalanced (referred to as the non-vanilla version) scenarios for handling positive and negative samples in a batch. The vanilla version ensures that the number of positive and negative samples is consistent within each batch, while the non-vanilla version does not require this consistency.
A lightweight dataset was selected for algorithm validation, where the effects of DPO, vanilla KTO, non-vanilla KTO, and the baseline were compared. The dataset and the results are as follows:
dataset
performance
* baseline model is "OpenLLMAI/Llama-2-7b-sft-model-ocra-500k"