YiyangZhou / POVID

[Arxiv] Aligning Modalities in Vision Large Language Models via Preference Fine-tuning
Apache License 2.0
71 stars 3 forks source link

the first loss is not exactly DPO loss #3

Closed elmiraloo closed 7 months ago

elmiraloo commented 7 months ago

Hi, the loss explained in the paper is slightly different from the code

https://github.com/YiyangZhou/POVID/blob/5d55ce605230f5ad3889701a894a98ddca6e1534/tool/dpo_trainer.py#L616

I understand why you do this but I'm wondering which loss you actually used for training the model as many of the arguments in run_dpo.sh & run_povid.sh do not match the arguments used to train the published checkpoints. I was wondering if the config/code published has major difference with the configs code you used to train those checkpoints. We are trying to publish a survey paper on different alignment methods used in VLMS & we want to make sure our comparison is fair.

elmiraloo commented 7 months ago

This is fixed after discussion with authors so I'm closing it.