After noise matrix reduction model is tested against a number of configurable samples from a dpo training dataset.
The generated answers are scored against chosen/rejected dpo pairs. High cosine similiarity with chosen results in a high value being added to the overall performance score and higher similarity with rejected results in higher score substracted from the overall performance score.
After noise matrix reduction model is tested against a number of configurable samples from a dpo training dataset.
The generated answers are scored against chosen/rejected dpo pairs. High cosine similiarity with chosen results in a high value being added to the overall performance score and higher similarity with rejected results in higher score substracted from the overall performance score.