New datasets used for videochat2_mistral

OpenGVLab / Ask-Anything

[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.

https://vchat.opengvlab.com/

MIT License

2.85k stars 230 forks source link

New datasets used for videochat2_mistral #181

Closed LiJiaqi96 closed 1 month ago

LiJiaqi96 commented 1 month ago

Very happy to hear that you have updated the model with mistral LLM. Is there any place to find the newly added datasets? Thanks!

Andy1621 commented 1 month ago

Good question! In the current version, only the smit is added. And we have updated the instruction data here.

LiJiaqi96 commented 1 month ago

Thanks for your reply! I noticed that some datasets like image caption -- coco has been reduced to a smaller size (100k). I'm curious about the reason and whether it is because the length of captions are relatively short? Thanks

Andy1621 commented 1 month ago

Yes. It's relatively short and similar to the Stage2 data. Besides, in our experiments, removing most of these data does not affect the results.

LiJiaqi96 commented 1 month ago

OK, it's great to train faster with the same performance :)