-
Thank you for your in interesting work and your shared code!
I'm very confused that whether the zero-shot performance on MSRVTT reported in [here](https://github.com/OpenGVLab/InternVideo/tree/main/D…
-
this is the results i've got on MSRVTT, which is really far worse than the paper results:
There must be something wrong in my test process and here's how i get this:
1. I've tried to run the text-…
-
According to the ReadMe at [https://github.com/OpenGVLab/InternVideo/tree/main/InternVideo1/Downstream/Video-Text-Retrieval](url), the zero-shot retrieval results will be obtained after running the co…
-
how to Video-Text Retrieval with Youku-mPLUG? not found demo
-
Dear author,
Thanks for your great work!
I'd like to know something about the zeroshot video-text retrieval result, is it fine-tuning with coco on a 14M BLIP or a 129M BLIP?
-
Hello! Thank you for sharing the excellent project, and I'm interested in your good job! I want to ask when the codes for Video-Text Retrieval will be released. If there is still a long time, I'd appr…
-
Azure Open AI is services from Auzure platform for Generative AI
Here we can perform search
I has APIs using REST, we can
Dense Captions. : For every Item detected in the image, it can genera…
-
直接跑demo/demo.ipynb, 模型选用https://huggingface.co/OpenGVLab/InternVideo2-Stage2_1B-224p-f4/blob/main/InternVideo2-stage2_1b-224p-f4.pt 发现效果不太理想。
首先需要修改两个地方才能正确加载模型:
1、demo/demo.ipynb 中在setup_internvid…
-
Thank you for your great open-source code, I am excited for the outstanding zero-shot performance over video-text retrieval. Can you share the inference code for video-text retrieval on MSRVTT, thank…
-
ICCV 21
一句话:在视频-文本匹配任务中,同时考虑了全局特征和局部特征,并且使用了一种高效的方式处理局部特征的对齐。
之前的方法主要是将视频的表示和文本的表示拉近,作者认为这种方式会损失很多细粒度的信息,于是作者考虑了局部信息。作者将视频分为若干个segment,每个segment的表示作为视频的local表示,将所有local表示使用max pooling融合,即得到视频的glob…