YehLi / xmodaler

X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense reasoning, and cross-modal retrieval).
Other
1.03k stars 111 forks source link

How to preprocess the annotations of given raw video #10

Closed ltp1995 closed 3 years ago

ltp1995 commented 3 years ago

Dear author, i want to train your released model on other captioning datasets, but now i only have the captions and video_names of given raw videos, so how to generate the following processed json and pickle files (i.e. captions_val.json, msrvtt_caption_anno_train.pkl). Could you provide the official preprocssing codes? fb965b19dedc6cbbfff3712859b6c8e

YehLi commented 3 years ago

You can refer to https://github.com/YehLi/xmodaler/blob/master/tools/msrvtt_preprocess.py for generating the json and pickle files.