Closed AcodeC closed 5 years ago
I pre-process the annotations and word embedding by command line. The corpus files consist of six parts: 1. annotations of training split represented by word index; 2. annotations of validation split represented by word index; 3. annotations of test split represented by word index; 4. the mapping from a real word to a word index; 5. the mapping from a word index to a real word. 6. pretrained word embedding from GloVe-840B-300d. I think you can recreate such pickle with the help of my description and the provided word mapping.
The reference pickle files consists of three parts: 1. the human-readable ground truth of training split; 2. the human-readable ground truth of validation split; 3. the human-readable ground truth of test split.
The preprocess codes of npy files have been provided in the repository.
Thanks a lot
Thank you for kindly share your code to the github, Could you tell me how to generate these pkl file file in the Data Section, as to say How to do the preprocess of the two dataset : MSVD and the MSR-VTT ?Thank for your attention!