-
Hi, I'm trying to reproduce the CLIP-ViP result. In the readme file, it is mentioned that the data preprocessing step follows HD-VILA. However, in the [configuration files](https://github.com/microsof…
-
Could I ask your STAN-self-B/16 training time in your paper.
And I really be astonished at frame number@12 and batch size@128, which means one forward need to process 1536 images, and images also wi…
-
Hi,
Great work. Thank you very much for sharing the code.
A small issue I encountered when trying to train C4C on DiDeMo dataset.
As recommended, I ran your ffmpeg script and tried to train the mod…
-
Hi! I have noticed that the name for your DiDeMo file is `didemo_2fps_360_trimed30`, while the name for MSRVTT is`msrvtt_2fps_224`. It seems a little different from [DATA](https://github.com/jayleicn/…
-
Hi! I have read the paper about mPLUG-2, it's really a great vision-language foundation model with a fantastic design.
**However, I have some doubts about the fairness of the SOTA comparison:**
Ac…
-
Hello, can you release DiDeMo caption? Thanks
-
It seems that link provided in download_tvr.sh and download_didemo.sh is not working.
I got following error executing those scripts:
```
Resolving convaisharables.blob.core.windows.net (convaisha…
-
Hi, dear authors, how do you pre-process DiDeMo dataset?
-
Dear authors,
I am trying to reproduce results of MSRVTT-QA using multimodal encoder as decoder. After running the scripts/eval_vqa.sh on MSRVTT-QA test set, on "ft_msrvtt_qa_singularity_temporal_…
-
See example: http://plnkr.co/edit/9GdLbXM0mh5zHPKhcwCL?p=info
Both of these should parse correctly:
```
```
Given:
```
@Directive({
selector: '[bsPane]'
})
class Foo {
@Input() bsPaneTitle: …