RifleZhang / LLaVA-Hound-DPO

120 stars 17 forks source link

Which SFT setup is recommended now? #14

Open tyleryzhu opened 1 month ago

tyleryzhu commented 1 month ago

It seems like there's three different SFT setups recommended between the code and the paper.

Paper:

Code (your ckpt):

Code (new recipe I assume?):

I assume the new recipe is one you tested and gets the same/better numbers than those in the paper? If you could clarify the different settings that would be much appreciated. Thank you!

RifleZhang commented 6 days ago

Hello, from code https://github.com/RifleZhang/LLaVA-Hound-DPO/blob/main/llava_hound_dpo/sft_scripts/video_sft_qa_240k.sh#L19 for SFT stage, it is 100k image + 240k video QA. A small set of 15k caption is mixed, which inspired from ShareGPT4V training, but we didn't tested if that data is removed.