result of mmbench after dpo

NiuTrans / Vision-LLM-Alignment

This repository contains the code for SFT, RLHF, and DPO, designed for vision-based LLMs, including the LLaVA models and the LLaMA-3.2-vision models.

85 stars 5 forks source link

result of mmbench after dpo #17

Open luohao123 opened 1 week ago

luohao123 commented 1 week ago

Will the mmbench test set score drop after dpo? Does this repo supports dpo without another reward model loaded?

wangclnlp commented 1 week ago

Thanks for your attention! We never test on the MMbench. I believe this performance may be related to the vision-LLM and the preference data used for DPO training. Also, this repo supports DPO training without needing to load a reward model (please take a look at this script).

luohao123 commented 1 week ago

i think it might dropped on mmbench, which is a critical leaderboard in terms of real use applications.

wangclnlp commented 1 week ago

Thanks for your suggestion! We will also test the performance of dpo on this benchmark.