-
![image](https://user-images.githubusercontent.com/55907441/149606001-dd677291-cb45-4edf-ba3b-cc5546e12988.png)
It seems that caption and video must be one by one pairing diagonally .
I am tryin…
-
Thanks for your excellent job! And I had a try on your chat demo yet, it showed wonderfully.
Then I want to use this model to summary my videos. According to README, I download the transnetv2-pytorch…
-
I could not find the file scripts/evaluation/stage2/zero_shot/1B/eval_msrvtt_no_deepspeed.sh in the repo. Is it removed? if so pls update the readme accordingly
-
Thankyou for your code shared!
The following error occurs when i train the DGL model with code provided. Can you help me:
![image](https://github.com/knightyxp/DGL/assets/88419969/3c2fd24a-4fb3-4f8e…
-
I trained the model on single Nvidia RTX-4090 use the default config setting. However the result of the test dataset is significantly worse than the paper reported e.g. CIDer in msvd dataset from 113.…
-
Thank you very much for your nice work! However, I encountered the following error when executing `utils/extract_frame_and_wav_multiprocess.py` for processing MSRVTT. Additionally, the progress bar is…
-
Would you please release the inference result on MVBench and the other three zero-shot question-answering benchmarks (MSVD-QA, MSRVTT-QA, ActivityNet-QA)?
-
Hello, i'm new in this field and I'm a bit confused about how to calculate the metric on the MSRVTT set, when each video will have 20 corresponding descriptive captions. So how do we calculate to get …
-
Nice work! Can you provide the link to pre-computed feature for MSVD and MSRVTT? The link you provided for MSVD has expired.
-
Appreciate your efforts in maintaining this project!
While I ran the zero-shot VQA inference (generating results) on the MSRVTT dataset, it took 28 hours (using 4 A5000) to finish. I recognize that…