WingsBrokenAngel / Semantics-AssistedVideoCaptioning

Source code for Semantics-Assisted Video Captioning Model Trained with Scheduled Sampling Strategy
MIT License
55 stars 17 forks source link

How to generate these pkl file in the Data Section? #2

Closed AcodeC closed 5 years ago

AcodeC commented 5 years ago

Thank you for kindly share your code to the github, Could you tell me how to generate these pkl file file in the Data Section, as to say How to do the preprocess of the two dataset : MSVD and the MSR-VTT ?Thank for your attention!

WingsBrokenAngel commented 5 years ago

I pre-process the annotations and word embedding by command line. The corpus files consist of six parts: 1. annotations of training split represented by word index; 2. annotations of validation split represented by word index; 3. annotations of test split represented by word index; 4. the mapping from a real word to a word index; 5. the mapping from a word index to a real word. 6. pretrained word embedding from GloVe-840B-300d. I think you can recreate such pickle with the help of my description and the provided word mapping.

The reference pickle files consists of three parts: 1. the human-readable ground truth of training split; 2. the human-readable ground truth of validation split; 3. the human-readable ground truth of test split.

The preprocess codes of npy files have been provided in the repository.

AcodeC commented 5 years ago

Thanks a lot