Closed ZHZisZZ closed 2 months ago
Hey! @ZHZisZZ, good point, let's improve the docs. I committed an improvement to the leaderboard just now. TLDR:
ref_model
, e.g. Qwen. Alternatively, all the reference models are listed in the evaluation configs.If you go to a folder, you can see the ref free results are separate.
The reference free results can also be loaded with this script https://github.com/allenai/reward-bench/blob/main/analysis/get_dpo_ref_free_results.py
Thank you for your clarifications. I figured out that some reference models are pre-trained models, which may not be the actual reference models that were used for training. This can result in some issues because these pre-trained models do not always recognize the prompt template for the fine-tuned models.
And, out of curiosity, I have some extra questions:
1) Based on your experience, do these pre-trained reference models offer advantages over the reference-free version?
2) If so, then probably we can use an arbitrary pre-trained model as a ref as long as it shares the same vocabulary as the fine-tuned model (although page 11 of the paper says using the wrong ref may lead to bad performances).
@ZHZisZZ I think it'll be a mixed bag. You'll probably gain 1-5% performance by using the SFT checkpoint vs the base model. Some models aren't documented well and may just not have an SFT checkpoint.
Swapping the wrong reference model we tried (accidentally) and performance entirely regressed to random on a few models.
Thank you for providing such a valuable benchmark.
I am seeking clarification on the model/reference specifications for DPO rewards, which are not readily apparent in either the paper or the leaderboard. For example, it is unclear whether models like Llama-3-8B-Instruct, Qwen, and Zephyr were evaluated without or with references. If references were used, could you please provide guidance on how to access the reference model?
Thank you for your assistance.