-
Thank you for your wonderful project!
Could you provide the train/test split JSON files for the MSR-VTT caption dataset? I am unable to access the following files:
• datasets/annotations_all/ms…
-
Dear authors,
First of all, thank you very much for your outstanding work and for sharing the code. Using the same device(A6000 GPU) and environment settings you shared, we obtained the results 'R@…
-
Thank you for presenting such an exciting work. Congratulations!
I have a question regarding Table A3. Could you please provide more details on how the FVD is calculated? As this metric can be very…
-
Hello. Can you apply the evaluation results (especially the zero-shot retrieval performance on MSR-VTT dataset) for the videochat2 stage-1 model. Should it perform better the _**UMT model**_ or not? T…
-
Hi, following on the above discussion, can you tell how you selected the 2048 samples for both the datasets? Because on calculating FVD for the entire dataset of MSR-VTT i.e. on 2990 videos, I got a s…
-
We test the performance of VideoClip through the video-text retrieval task on the COIN dataset, but the performance is much lower than the reported performance of VideoQA (26%
-
Hello Antoine
Thanks for sharing your work. I am trying to evaluate your code on MSR-VTT test dataset. I followed all the instructions you provided on readme.md file. Furthermore, I think the CSV f…
-
Which directory is this msr-vtt_model.pth in?
-
Since the paper has experiment of pre-training on HowTo100M and fine-tuning on MSR-VTT, I'd like to know if I can use this code to train the model on MSR-VTT. Thank you.
-
I read in the readme file, paligemma can captioning a short video, anyone can guide me to do that?
Does it extract every frames on the video? Or does the paligemma tokenizer directly support video…