-
Hello, recently I've been working on image-captioning tasks and noticed that you reported CIDEr scores on COCO in your paper. In fact, I failed to find any golden annotations of COCO test set so I won…
-
#### :bug: I need to change the MODE every time I watch a new movie on Amazon.
Program version 3.5.1
Closed captioning/subtitles turn ON in Amazon Prime Video
I will switch to Amazon Prime vide…
-
Hi,
Is it possible to use image caption for training?
Here you only mention filename that will be the trigger word but you made disappear all options on caption training.
Could you implement it …
-
Dear @agermanidis,
I am Roberto Minelli and I am part of the Team that organizes [TEDxLakeComo](http://www.tedxlakecomo.com).
For the captioning process of [TED Talks](https://www.ted.com/talks)…
-
If you watch Tzviya's CEPC talk after clicking on "Sync video and hide transcript", you'll see no links at all - including no link to the document she's talking about. The links are in the longer tra…
-
Hi guys, I am trying to generate my own features.tsv and labels.tsv for my dataset, but I am stuck at the following:
1. I have a slight confusion regarding what exactly these features are. Upon r…
-
Hi, thanks again for contributing such good work. Just wondering have you revealed prompts(i.e., instructions) for several multi-modality tasks used in OFA-CN, especially for visual grounding task? th…
-
**Is your feature request related to a problem? Please describe.**
I have been actively using this repository for multimodal training involving images and text. It has been incredibly helpful for my …
-
[Generating Visual Explanations](https://link.springer.com/chapter/10.1007/978-3-319-46493-0_1)
Clearly explaining a rationale for a classification decision to an end user can be as important as the …
-
Hi, I have checked the Clip-Vision embedding (last hidden state) of Blip2&InstructBlip on huggingface (instructblip-vicuna-7b), the dimension is 257x1408. However, the multi-modal matching space of Vi…