Closed hubenjm closed 15 hours ago
In datasets_mixture.py there is references a .json file that is not entirely clear where it came from based on the name: https://github.com/Efficient-Large-Model/VILA/blob/main/llava/data/datasets_mixture.py#L62
Is this file the same as https://huggingface.co/datasets/mit-han-lab/ShareGPT4V/blob/main/filter-share-captioner_coco_lcs_sam_1246k_1107.json?
if not, can you provide this file or some description of how it was generated?
That's correct. You may use the https://huggingface.co/datasets/mit-han-lab/ShareGPT4V/blob/main/filter.py to process the Sharegpt4V.
In datasets_mixture.py there is references a .json file that is not entirely clear where it came from based on the name: https://github.com/Efficient-Large-Model/VILA/blob/main/llava/data/datasets_mixture.py#L62
Is this file the same as https://huggingface.co/datasets/mit-han-lab/ShareGPT4V/blob/main/filter-share-captioner_coco_lcs_sam_1246k_1107.json?
if not, can you provide this file or some description of how it was generated?