Can you release the code about preprocessing the visual / sentence feature?
I want to know about how you get the visual feature (ResNet or others?) and what is the exact meaning of the sentence feature files.
I extract the same feature as "Temporal Activity Localization via Language Query", please refer to their paper and their GitHub code~
visual feature:c3d~
sentence feature: skip-thought~
Can you release the code about preprocessing the visual / sentence feature? I want to know about how you get the visual feature (ResNet or others?) and what is the exact meaning of the sentence feature files.