X-PLUG / mPLUG-2

mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video (ICML 2023)
Apache License 2.0
213 stars 17 forks source link

Could you provide the JSON file in the video captioning task? #3

Closed myccver closed 1 year ago

myccver commented 1 year ago

Thank you for your excellent work! However, I noticed that there are missing JSON files in the MSRVTT and MSVD datasets that your code requires. Could you provide them?"

ChdDongyang commented 11 months ago

May I ask if you have obtained these files and where did you get them?

myccver commented 11 months ago

May I ask if you have obtained these files and where did you get them?

Please refer to the following format: ''' Video Captioning Dataset

jsonl format [ {"video_id": str, "caption": str}, or {"video_id": str, "golden_caption": List[str]} xxxx ] '''

ChdDongyang commented 11 months ago

May I ask if you have obtained these files and where did you get them?

Please refer to the following format: ''' Video Captioning Dataset

jsonl format [ {"video_id": str, "caption": str}, or {"video_id": str, "golden_caption": List[str]} xxxx ] '''

OK, thank you very much!

rose-jinyang commented 10 months ago

Hello @Roleone123 and @MAGAer13 Could you provide the URLs to MSVD and MSR_VTT datasets for video captioning?