TideDra / VL-RLHF

A RLHF Infrastructure for Vision-Language Models
Apache License 2.0
77 stars 4 forks source link

Reproduction of InternLM-XComposer2 #9

Open ikodoh opened 2 months ago

ikodoh commented 2 months ago

Hi,

Thank you for sharing a great work. I'm trying to reproduce the performance of InternLM-XComposer2 + DPO + VLFeedback, but I found that the baseline performance (InternLM-Xcomposer2-VL-7b) you reported is slightly different from the performance in the original paper. Can I know why? Also, the dpo_internlmxc2vl7b.sh file in ./scripts folder is the command for reproducing your InternLM-Xcomposer2-VL-7b-DPO model? If not, could you share the script file or config file to reproduce InternLM-Xcomposer2-VL-7b-DPO model. Thank you again for the nice work.

TobiasLee commented 2 months ago

The evaluation scores are obtained from VLMEval, you can download the ckpt and evaluate yourself to see if the score matches. I am not sure the reported numbers can be 100% reproduced given the numeric inconsistency across platforms

Did you run the code for DPO training?