ecoxial2007 / LGVA_VideoQA

Language-Guided Visual Aggregation for Video Question Answering
4 stars 2 forks source link

feature extraction flies #2

Open bxwldljh opened 9 months ago

bxwldljh commented 9 months ago

can you release the code of video and text feature extraction? many thanks to you!

ecoxial2007 commented 9 months ago

Thank you for your interest in our work. We have employed ViT and BERT from OpenAI's CLIP as feature extractors. As a result, we needed to modify the source code of CLIP, which can be seen at https://github.com/openai/CLIP/blob/main/clip/model.py (line 235 for images, line 354 for text). You may refer to my src/tools/extract_embedding.py for the model loading and forward process.

ecoxial2007 commented 9 months ago

I forgot to mention one point: please do not apply normalization to the features, as it would result in loss of information.