-
b'\x80\x04\x95\x0e\xf4\x00\x00\x00\x00\x00\x00}\x94(\x8c\x08video_id\x94\x8c\tvideo9770\x94\x8c\x07image_h\x94K\xf0\x8c\x07image_w\x94M@\x01\x8c\tnum_boxes\x94K\x02\x8c\x05boxes\x94\x8c\x15numpy.core.…
-
Hey @SanniM3,
When I run inference using the pyscene fine tuned model, it crashes after about 5 videos with the following output.
Any idea what might be causing this?
Evan
```
evan@mlpcw3-3…
-
Hi,
How much times does it take to fine-tune on MSRVTT on 8 v100 GPUs?
Thank you.
-
Hi! I have noticed that the name for your DiDeMo file is `didemo_2fps_360_trimed30`, while the name for MSRVTT is`msrvtt_2fps_224`. It seems a little different from [DATA](https://github.com/jayleicn/…
-
```
from lavis.datasets.builders import load_dataset
msrvtt_dataset = load_dataset("msrvtt_caption")
```
as picture
![image](https://user-images.githubusercontent.com/11017886/199537063-10263d81-…
-
Hi! I have read the paper about mPLUG-2, it's really a great vision-language foundation model with a fantastic design.
**However, I have some doubts about the fairness of the SOTA comparison:**
Ac…
-
Thanks for your contributions.
When I train model based on the help of readme, I meet this question:
#########################################
SwinBERT/src/modeling/load_bert.py", line 12, in …
-
-
Hi, I ran the code on MSRVTT dataset with 2 A100s, and its loss becomes nan after some iterations, like [this issue](https://github.com/ArrowLuo/CLIP4Clip/issues/54#issue-1088556468). However, I foun…
-
Dear authors,
I am trying to reproduce results of MSRVTT-QA using multimodal encoder as decoder. After running the scripts/eval_vqa.sh on MSRVTT-QA test set, on "ft_msrvtt_qa_singularity_temporal_…