RifleZhang / LLaVA-Hound-DPO

122 stars 18 forks source link

Which 240K subset for SFT ? #4

Closed yuehaoa closed 7 months ago

yuehaoa commented 7 months ago

In huggingface, you mention that: "[900k Video QA]: For the 300k video frames above, we generate 3 qa pairs for each, in total 900k. We only used 240k subset for SFT." Which 240k subset do you use for SFT?Do you randomly sample 240K? or can you also provide the 240K subset?

RifleZhang commented 7 months ago

240k is released at https://huggingface.co/datasets/ShareGPTVideo/train_video_and_instruction/tree/main/video_instruction/train/qa

We also updated sft training scrtips, please follow the steps in the training page: https://github.com/RifleZhang/LLaVA-Hound-DPO/blob/main/llava_hound_dpo/sft_scripts/README.md