-
Dear authors,
First of all, thank you very much for your outstanding work and for sharing the code. Using the same device(A6000 GPU) and environment settings you shared, we obtained the results 'R@…
-
Thank you for sharing your work. Sincerely want a clarification about mask radio. I refer the closed issue "About 'compute_trick_metric'", adjusting seed to 42 and mask ratio to 0.5 for msrvtt, but R1…
-
Dear Authors,
I am trying to reproduce Zeroshot performance with the checkpoint [ViCLIP-L-14 InternVid-10M-FLT ](https://huggingface.co/OpenGVLab/ViCLIP).
However, the performance is different from …
-
Hello, wonderful project!. Here I wonder how to finetune the pre-trained models on downstream video-text retrieval datasets like MSR-VTT, LSMDC, and MSVD? I notice that the script for zero-shot retrie…
-
I now need to validate the performance on the MSRVTT dataset. How can this be implemented? Could you provide a corresponding tutorial?
-
Hi,
Congratulations on the great work!
Would you mind providing a pointer to where did you find the dataset split for the captioning datasets, as it seems they are not always consistent with the…
-
this is the results i've got on MSRVTT, which is really far worse than the paper results:
There must be something wrong in my test process and here's how i get this:
1. I've tried to run the text-…
-
Could you please provide a script or JSON file of the ID map from M3IT to VideoChat2IT? Matching different files can be quite challenging. For example, `coco llava minigpt4 paragraph_captioning te…
-
Hellow , nice job !
I can not reproduce the MSRVTT finetuned model,and I set each args as the [log](https://pjlab-gvm-data.oss-cn-shanghai.aliyuncs.com/internvideo/retrieval/msrvtt/kc4_finetune_1e-32…
-
Besides, how can we prepare the data files like *.label.tsv / *.caption.tsv / *.caption.linelist.tsv to train SwinBert on our own dataset? Thank you very much ~