Closed WeiXiongUST closed 6 months ago
While this preference model is also for pairwise comparison, the training and use are quite different from pairRM. I think we can refer it to as the slicpairpm as it is most similar to that of SLiC-HF: Sequence Likelihood Calibration with Human Feedback.
@WeiXiongUST just need to run the following (I think)
make style
make quality
@WeiXiongUST just need to run the following (I think)
make style make quality
Have tested these two commands locally!
Could you help to add the new pairwise preference model RLHFlow/pair-preference-model-LLaMA3-8B?
The usage of the model is similar to the pairRM where we input a prompt and two responses, and the model will return the probability of the first response being preferred. I try to implement a pipeline in rewardbench/models/pairpm.py and also attach an example to use the model for your reference. I am wondering how should we merge such a customized model into the reward bench. Many thanks in advance!
The benchmark results are as follows.