-
Hello,
After going through your data, you just labeled objects which has boxes. The background like sky or water are not labeled.
Therefore, I am curious how your data can be used for semantic s…
-
### News
- Conferences
- [CVPR 2023](https://cvpr2023.thecvf.com/)
- 일시/장소: 6. 18 - 22, Vancouver convention center
- Main and Expo: 20 - 22, Workshop and Tutorial: 18-19
- 국내 부스: L…
-
当我运行
```
torchrun --nproc-per-node=8 run.py --data DocVQA_TEST --model Qwen2-VL-2B-Instruct --verbose
```
出现以下错误
```
[{'role': 'user', 'content': [{'type': 'image', 'image': '/vlmeval/images/Doc…
-
Hi @anas-awadalla
As described in #124, "Our training took place on 32 80GB A100s. We trained on 5M samples from MMC4 and 10M from LAION 2B."
I am interested in the details of loss during trai…
-
Hello Meta GenAI team (cc @ruanslv),
With regards to the 70B model, I'm currently looking into the implementation of the GQA architecture -- specifically after noticing the 8192 x 1024 layer shapes…
-
# 💻 cs
## 📚 mask (total: 9)
### 📃 Deep Pneumonia: Attention-Based Contrastive Learning for Class-Imbalanced Pneumonia Lesion Recognition in Chest X-rays
- **Authors:** Xinxu Wei, Haohan Bai, Xianshi …
-
Post your response to our challenge questions.
First, write down two intuitions you have about broad content patterns you will discover about your data as encoded within a pre-trained or fine-tuned…
-
SHOW-O 通过一个单一的Transformer架构,引入**离散去噪过程**处理图像的生成任务,LLM任务采用因果attention,图像生成任务采用全局attention,统一了多模态理解和生成任务,无需多个专门的模型。
- 在文本到图像生成任务中,能匹敌SD1.5的效果,但仍有提升空间;
- SHOW-O 支持多种任务类型,如视觉问答、图像修复、图像外推、混合模态生成等,无需针对…
-
Hello, I'm trying to understand how SAM works. I am interested in extracting the **image embeddings** created by **ImageEncoderViT**. Also, I'm interested in the output after combining _image embeddin…
-
Post your questions here about: [“Language Learning with Large Language Models”](https://docs.google.com/document/d/1vCRoU_g9yYwG31uZMdAVK8iNL5Jj8BB4iwcvarTq06E/edit?usp=sharing) and “Digital Doubles …