OpenGVLab / Ask-Anything

[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.
https://vchat.opengvlab.com/
MIT License
2.85k stars 230 forks source link

Discrepancy in Image ID Alignment Between M3IT and VideoChat2IT #199

Closed patrick-tssn closed 2 days ago

patrick-tssn commented 3 days ago

Could you please provide a script or JSON file of the ID map from M3IT to VideoChat2IT? Matching different files can be quite challenging. For example, coco llava minigpt4 paragraph_captioning textcaps (VideoChat2IT/caption) v.s. coco coco-cn flickr8k-cn image_paragraph_captioning msrvtt textcap (M3IT/captioning). In addition, the image IDs do not completely match; for instance, COCO images in VideoChat2IT have an additional directory compared to those in M3IT. I believe it would be beneficial to fully opensource this.

Andy1621 commented 2 days ago

Hi! You can change these datasets by yourself from M3IT, since we use the original annotations but change the file_name for our data.

patrick-tssn commented 2 days ago

You mean manually check the file for each split? That's fine, but solely changing file names is confusing and adds unnecessary workload without any benefits.