-
Can't reproduce the result of CLIP-ViP performance on MSRVTT. I used the default config file with epoch=100 and bs=16. Or epochs=5, bs=128 in the paper. The best perform of t2vR1 and v2tR1 are both 49…
-
First of all, thank you for sharing your good research.
There are only MSRVTT and VATEX datasets training parameters in **scripts**, and ActivityNet, DiDeMo, and LSMDC are not available. Can you pro…
-
Could I ask your STAN-self-B/16 training time in your paper.
And I really be astonished at frame number@12 and batch size@128, which means one forward need to process 1536 images, and images also wi…
-
Hi. Thanks for providing code! I'm having the same issue as #3 on the VQA demo. I have the Microsoft deberta-v2-xlarge ( https://huggingface.co/microsoft/deberta-v2-xlarge ) downloaded from huggingfac…
-
Dear Sir,
During fine-tuning of MSRVTT, we found no validation set in the code. How to choose a checkpoint?
Looking forward to your reply.
Thanks!
-
https://drive.google.com/drive/folders/1XTAAvx-d3BOyxkEzzN61tnbUkRH5AFi5
Annotation>msvd
Is 'msrvtt_mc_text.jsonl' a typo? Or do I miss something?
As the folder name is 'msvd', it seems like …
-
Hi!the performance of the model I trained using your code is 40.95, but the performance of the model you provided is 41.15, I want to know why? In addition, your model seems to be obtained in the seco…
-
Hi.
configs/ret_msrvtt_mc.yaml
data_root: ${oc.env:SL_DATA_DIR}/videos_images
anno_root_downstream: ${oc.env:SL_DATA_DIR}/anno_downstream
train_file: ['${anno_root_downstream}/msrvtt_ret_t…
-
Hi, I have found some spelling errors in the test set of MSRVTT. For example, "badmitten", "peson", "tenni". How did you handle such ground truth errors during the testing?
-
Hey @SanniM3,
When I run inference using the pyscene fine tuned model, it crashes after about 5 videos with the following output.
Any idea what might be causing this?
Evan
```
evan@mlpcw3-3…