-
-
The article mentions that "where they randomly chose 5 ground-truth sentences per video. We use the same setting when we compare with that approach".Does the training set, validation set and test set …
-
Hi, thanks a lot for sharing your solid work, I have learned much from your paper and code. Here I still have a question about the part of temporal modeling.
I saw that you have compared the performa…
-
Hey, I have downloaded the YouCook2 features and am training a common space learning network to associate positive sentence-clip pairs for video retrieval.
I have loaded the video features and the tex…
-
Hi, in your paper table 6, why would having an MLP head instead of a linear head improve NN video retrieval accuracies?
While doing NN video retrieval, do you train an additional linear head/MLP?
T…
-
I am trying to verify/reproduce your paper's validation results **without training** it myself and expected 42.6% R@1 accuracy for MSR-VTT.
But when I follow the instructions from [TRAIN_AND_VALID…
-
Basically, I would like to run video retrieval using this distilled model: https://huggingface.co/OpenGVLab/InternVideo2_distillation_models/blob/main/stage1/L14/L14_dist_1B_stage2/pytorch_model.bin
…
-
My goal is to build a unique multimodal WooCommerce search experience with Vespa multivectors and an hybrid ranking on text-BM25, text-vectors, and image-vectors.
For instance, E-commerce can use:
…
-
this is the results i've got on MSRVTT, which is really far worse than the paper results:
There must be something wrong in my test process and here's how i get this:
1. I've tried to run the text-…
-
I cannot access https://people.eecs.berkeley.edu/~lisa_anne/didemo/. It asks for a username and password to log in. Are there other ways to download the models and the 13 videos missing from AWS? When…