Open luohao123 opened 1 week ago
Thanks for your attention! We never test on the MMbench. I believe this performance may be related to the vision-LLM and the preference data used for DPO training. Also, this repo supports DPO training without needing to load a reward model (please take a look at this script).
i think it might dropped on mmbench, which is a critical leaderboard in terms of real use applications.
Thanks for your suggestion! We will also test the performance of dpo on this benchmark.
Will the mmbench test set score drop after dpo? Does this repo supports dpo without another reward model loaded?