X-PLUG / mPLUG-2

mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video (ICML 2023)
Apache License 2.0
219 stars 18 forks source link

Could you provide the JSON file in the video captioning task? #3

Closed myccver closed 1 year ago

myccver commented 1 year ago

Thank you for your excellent work! However, I noticed that there are missing JSON files in the MSRVTT and MSVD datasets that your code requires. Could you provide them?"

ChdDongyang commented 1 year ago

May I ask if you have obtained these files and where did you get them?

myccver commented 1 year ago

May I ask if you have obtained these files and where did you get them?

Please refer to the following format: ''' Video Captioning Dataset

jsonl format [ {"video_id": str, "caption": str}, or {"video_id": str, "golden_caption": List[str]} xxxx ] '''

ChdDongyang commented 1 year ago

May I ask if you have obtained these files and where did you get them?

Please refer to the following format: ''' Video Captioning Dataset

jsonl format [ {"video_id": str, "caption": str}, or {"video_id": str, "golden_caption": List[str]} xxxx ] '''

OK, thank you very much!

rose-jinyang commented 1 year ago

Hello @Roleone123 and @MAGAer13 Could you provide the URLs to MSVD and MSR_VTT datasets for video captioning?

idj3tboy commented 1 month ago

I request if the author can provide us with the jsonl files of the train and test partitions. Thank you.