-
- https://arxiv.org/abs/2107.02192
- 2021
トランスフォーマーは、言語領域と視覚領域の両方で成功を収めている。
しかし、長い文書や高解像度の画像のような長いシーケンスに拡張するには、自己保持機構が入力シーケンスの長さに対して二次的な時間とメモリの複雑さを持つため、法外なコストがかかります。
本論文では、言語タスクと視覚タスクの両方において、長いシ…
e4exp updated
3 years ago
-
hi how to get llama-3.2 to work with ipex_llm ?
here's my code.
```
import requests
import torch
from PIL import Image
from transformers import MllamaForConditionalGeneration, AutoProcessor
imp…
-
准备复现ChineseClip论文,以CLIP-VIT-B/16 初始化image encoder部分,下载对应的是 https://huggingface.co/openai/clip-vit-base-patch16/tree/main 但是加载模型参数时,发现image encoder部分参数加载不上。我打印发现对应参数名称以vision_model.encoder.layers.开头…
-
### System Info
transformers version:'4.45.2'
python version: 3.9.20
torch version: '2.4.1+cu124'
![image](https://github.com/user-attachments/assets/cb141f17-3482-462b-8184-7210f0a6c75e)
### W…
-
Traceback (most recent call last):
File "I:/Code/CC-DETR-main/Networks/ALTGVT1.py", line 596, in
model = alt_gvt_large(pretrained=True)
File "I:/Code/CC-DETR-main/Networks/ALTGVT1.py", lin…
-
### Feature request
The `AutoModel.from_config` does not work with Mllama (MllamaConfig, MllamaVisionConfig). I would like to request the ability to use Mllama through `AutoModel`.
### Motivation
T…
-
### Feature request
Add support for LlamaGen, an autoregressive image generation model, to the Transformers library. LlamaGen applies the next-token prediction paradigm of large language models to vi…
-
How can I add/extend MLP head in same model for detection? Let's say head is detecting objects A,B,C in a image and we want to train by adding or extend MLP/classification head to detect objects D, E…
-
Hi, I ma working on using vision transformers not only the vanilla ViT, but different models on UMDAA2 data set, this data set has an image resolution of 128*128 would it be better to transform the im…
-
Hi @rosinality, hope you are doing well!
I really like your repo, especially for dataloader and augmentation part for image classification.
I am not majorly working on Vision field but still I have …