-
Hello! Thank you so much for the contribution of this repo.
I'm so interested in this work, and I'm suveying papers with key words like "captioning anything" or "instance level captioning" or "per pi…
-
Hi Oscar Team,
Thanks for the interesting paper and open-sourcing your model.
On your [download](https://github.com/microsoft/Oscar/blob/master/DOWNLOAD.md) page, you mention that images are fe…
-
## 0. 論文
[Attacking Visual Language Grounding with Adversarial Examples: A Case Study on Neural Image Captioning](https://arxiv.org/abs/1712.02051v2)
Hongge Chen, Huan Zhang, Pin-Yu Chen, Jinfeng Yi…
-
在执行代码`python visual_chatgpt.py --load ImageCaptioning_cpu,Text2Image_cpu`
出现这种情况:
![image](https://github.com/microsoft/TaskMatrix/assets/135227066/c8875c6b-c295-40c9-8ee5-c425f009f1c9)
最后报错:SSLErr…
-
Find a suitable endcoder-decoder model and start traing the model with suitable datasets.
-
- [ ] [ Neural Baby Talk](http://openaccess.thecvf.com/content_cvpr_2018/papers/Lu_Neural_Baby_Talk_CVPR_2018_paper.pdf)
Keywords:
Image captioning
predict template-like sentences
Reference: [Hy…
-
I would like to request support to convert the blip-2 model for onnx conversion.
I have tried to convert the model using torch.onnx.export method but there are issues as the input to the forward me…
-
In the paper about PaliGemma, it is indicated that it supports tasks such as Image Captioning, Visual Question Answering, Detection, and Referring Expression Segmentation.
Can Llama-Factory suppor…
-
Dear coauthors,
- In pretraining/finetuning stage, for vision-language task (especially for visual_grounding and caption), can I set the length of generated tokens? Because I want a longer generated …
-
[paper](https://arxiv.org/pdf/2205.01917.pdf)
## TL;DR
**problem :** 좋은 vision backbone 만들기. 분류 레이블에 대한 이미지 프리트레이닝, 이미지-텍스트 pair를 받아 contrastive loss로 학습되는 dual-encoder model, image 인코더가 있고 …