-
First of all congrats on the paper and thanks for providing the code!
In the paper at 'Zero-shot language-based multi-modal joint retrieval' you mention that integrating/combining multiple embeddin…
-
Hi,
We found that video text joint loss in pretraining is calculated from masked video and text. Why not use the origin video and text like retrieval finetune?
https://github.com/microsoft/UniVL/blo…
-
### Version
1
### DataCap Applicant
Black He
### Project ID
7
### Data Owner Name
jcphysics
### Data Owner Country/Region
Singapore
### Data Owner Industry
Education & Training
### Website…
-
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮████████████████████████████████████████████████████████████████████████████████████████████████████…
-
Hello,
Thanks for your great work!
We'd like to run zero-shot evaluation on msrvtt qa task. However, following the readme below (set zero-shot evaluation and prepare dataset), we still encounter th…
-
Hello, wonderful project!. Here I wonder how to finetune the pre-trained models on downstream video-text retrieval datasets like MSR-VTT, LSMDC, and MSVD? I notice that the script for zero-shot retrie…
-
Hey - I am unable to reproduce the reported zero-shot results. So far I tried it on MSRVTT and MSVD, I would appreciate it if you kindly have a look.
Here is what I got after running these 2 script…
-
Thanks for your extraordinary work of video-text retrieval with T2VLAD.
Here, I have a little request about this work: could you share the other dataloaders, configs of MSR-VTT at 1k-A split, MSVD an…
-
### Question Validation
- [X] I have searched both the documentation and discord for an answer.
### Question
```
# This class will transform video to text and images
class VideoProcessor:
de…
-
When I write the text video retrieval function as a Python script, when using the function, GPU video memory increases with the increase of the number of uses (search), and the kill script is released…