X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense reasoning, and cross-modal retrieval).
Other
1.03k
stars
111
forks
source link
How to preprocess the annotations of given raw video #10
Dear author, i want to train your released model on other captioning datasets, but now i only have the captions and video_names of given raw videos, so how to generate the following processed json and pickle files (i.e. captions_val.json, msrvtt_caption_anno_train.pkl). Could you provide the official preprocssing codes?
Dear author, i want to train your released model on other captioning datasets, but now i only have the captions and video_names of given raw videos, so how to generate the following processed json and pickle files (i.e. captions_val.json, msrvtt_caption_anno_train.pkl). Could you provide the official preprocssing codes?