-
Hi,
I tried running the XVLM model (via `get_model(model_name="xvlm-coco", device="cuda", root_dir="./tmp/")` in the notebook). I get the error
```
CUDA error: CUBLAS_STATUS_EXECUTION_FAILED whe…
-
Hi! I have read the paper about mPLUG-2, it's really a great vision-language foundation model with a fantastic design.
**However, I have some doubts about the fairness of the SOTA comparison:**
Ac…
-
Hello!
I am wondering if you have processed image features for this task before. And do you know what about the model's performance with image features?
Thank you very much!
-
is it possible to fineture semimtr without trained language model use?
if possible please let me know how?
Thank you
-
# Language Matters: A Weakly Supervised Vision-Language Pre-training Approach for Scene Text Detection and Spotting
## Information
- Authors: Chuhui Xue+
- Organization: Nanyang Technological Uni…
-
Hi!
Thank you for your amazing work. I have a question about how to control the number of data samples in a batch. According to your paper, you said that
We mix all the pretraining data within…
-
I think there is a bug in `data/dataset_3d.py` when tokenizing the prompts in the `data/templates.json`
The specific point is shown as below, which is cited from the author's implementation.
htt…
-
### Duplicates
- [X] I have searched the existing issues
### Summary 💡
1. Attach to a VirtualBox instance, give AI a default OS like ubuntu
2. if AI decide to use computer: enter "screenshot-mouse…
-
Sik-Ho Tsang. [Review: Vision Transformer (ViT)](https://sh-tsang.medium.com/review-vision-transformer-vit-406568603de0).
Dosovitskiy A, Beyer L, Kolesnikov A, et al. [An image is worth 16x16 words: …
-
Hi, I pretrained OFA-tiny on my private a tsv file in the form of only VQA (or a tsv file including only caption).
For example,
`1 000002b66c9c498e what is the danger for an object in the given ima…