-
Dear Author
Thanks for your great work! I have a question about this file
https://github.com/YehLi/xmodaler/blob/master/tools/msrvtt_preprocess.py.
I follow the Line4 of the above file to buil…
-
Dear author, i want to train your released model on other captioning datasets, but now i only have the captions and video_names of given raw videos, so how to generate the following processed json and…
-
Can you provide the checkpoint that was only trained on WebVid (without CC3M)?
-
Hi,
Congrats on the amazing work!! I want to fine-tune this model on a custom video dataset. It has a video and text as the inputs but no image is provided in the input. How can I fine-tune without…
-
@ArrowLuo Hi, I directly train the CLIP4clip(meanP) on ActivityNet and get R@1=37.9 which is much worse than 40.5 reported in Table 4.
I extracted images from the original videos with FPS=1, and tr…
-
When I use the following configuration to train the model on `MSRVTT Training-9K`, the best result I got is
`07/27/2021 13:11:01 - INFO - sim matrix size: 1000, 1000
07/27/2021 13:11:01 - INFO - …
-
Hi, I am confused about the description of frame sampling while testing: 'The values for i are determine using a stride S, resulting in an array of video embeddings v = [v_0 , v_S , v_2S , v_M ].'
Co…
-
You have done a great job! Have you extracted the local feature of msvd & msrvtt? If you don't mind, can it be released?
zchoi updated
3 years ago
-
Hi, great work and thanks for sharing the code.
I'm trying to reproduce the results on MSRVTT for comparison but the training is taking longer than expected (~6 hours/epoch)
The bottleneck is pre…
-
Hello,
Thanks for the great work on ClipBERT.
I can see the pretrained weights for the pretraining task are available.
Is it possible to make available a checkpoint from the MSRVTT fine tuning e…