-
Dear author, i want to train your released model on other captioning datasets, but now i only have the captions and video_names of given raw videos, so how to generate the following processed json and…
-
Can you provide the checkpoint that was only trained on WebVid (without CC3M)?
-
Hi,
Congrats on the amazing work!! I want to fine-tune this model on a custom video dataset. It has a video and text as the inputs but no image is provided in the input. How can I fine-tune without…
-
When I use the following configuration to train the model on `MSRVTT Training-9K`, the best result I got is
`07/27/2021 13:11:01 - INFO - sim matrix size: 1000, 1000
07/27/2021 13:11:01 - INFO - …
-
Hi, I am confused about the description of frame sampling while testing: 'The values for i are determine using a stride S, resulting in an array of video embeddings v = [v_0 , v_S , v_2S , v_M ].'
Co…
-
@ArrowLuo Hi, I directly train the CLIP4clip(meanP) on ActivityNet and get R@1=37.9 which is much worse than 40.5 reported in Table 4.
I extracted images from the original videos with FPS=1, and tr…
-
You have done a great job! Have you extracted the local feature of msvd & msrvtt? If you don't mind, can it be released?
zchoi updated
3 years ago
-
Hi, great work and thanks for sharing the code.
I'm trying to reproduce the results on MSRVTT for comparison but the training is taking longer than expected (~6 hours/epoch)
The bottleneck is pre…
-
Hello,
Thanks for the great work on ClipBERT.
I can see the pretrained weights for the pretraining task are available.
Is it possible to make available a checkpoint from the MSRVTT fine tuning e…
-
Dear authors
can you share your extracted features for msvd and msr-vtt, and their extracted settings?
Btw, I also want to extract these features for videos by myself. Therefore, it would be be…