I tried running the XVLM model (via `get_model(model_name="xvlm-coco", device="cuda", root_dir="./tmp/")` in the notebook). I get the error
Hi! I have read the paper about mPLUG-2, it's really a great vision-language foundation model with a fantastic design.
**However, I have some doubts about the fairness of the SOTA comparison:**
I am wondering if you have processed image features for this task before. And do you know what about the model's performance with image features?
Thank you very much!
is it possible to fineture semimtr without trained language model use?
if possible please let me know how?
Thank you
# Language Matters: A Weakly Supervised Vision-Language Pre-training Approach for Scene Text Detection and Spotting
## Information
- Authors: Chuhui Xue+
- Organization: Nanyang Technological Uni…
Thank you for your amazing work. I have a question about how to control the number of data samples in a batch. According to your paper, you said that
We mix all the pretraining data within…
I think there is a bug in `data/dataset_3d.py` when tokenizing the prompts in the `data/templates.json`
The specific point is shown as below, which is cited from the author's implementation.
### Duplicates
- [X] I have searched the existing issues
### Summary 💡
1. Attach to a VirtualBox instance, give AI a default OS like ubuntu
2. if AI decide to use computer: enter "screenshot-mouse…
Sik-Ho Tsang. [Review: Vision Transformer (ViT)](https://sh-tsang.medium.com/review-vision-transformer-vit-406568603de0).
Dosovitskiy A, Beyer L, Kolesnikov A, et al. [An image is worth 16x16 words: …
Hi, I pretrained OFA-tiny on my private a tsv file in the form of only VQA (or a tsv file including only caption).
For example,
`1 000002b66c9c498e what is the danger for an object in the given ima…