Closed yuehaoa closed 7 months ago
240k is released at https://huggingface.co/datasets/ShareGPTVideo/train_video_and_instruction/tree/main/video_instruction/train/qa
We also updated sft training scrtips, please follow the steps in the training page: https://github.com/RifleZhang/LLaVA-Hound-DPO/blob/main/llava_hound_dpo/sft_scripts/README.md
In huggingface, you mention that: "[900k Video QA]: For the 300k video frames above, we generate 3 qa pairs for each, in total 900k. We only used 240k subset for SFT." Which 240k subset do you use for SFT?Do you randomly sample 240K? or can you also provide the 240K subset?