-
### System Info
python 3.10.15
torch 2.5.1
transformers 4.46.2
tokenizers 0.20.3
### Information
- [ ] The official example scripts
- [x] My own modified scripts
### 🐛 Describe the bug…
-
**code:**
query = 'What does the picture show?'
image_paths = ['/home/downloads/test.jpg']
huatuogpt_vision_model_path = "/home/llm_models/HuatuoGPT-Vision-7B"
from cli import HuatuoChatbot
b…
-
## 🚀 Feature
Please consider adding RoI head for Vision Transformer, which can be used for action detection using Vision Transformer.
## Motivation
Performance of MViT on the AVA dataset is …
-
I'm trying to quantize llava-1.5 according to the `readme.md` with the following scripts, and tells:`AttributeError: 'LlavaConfig' object has no attribute 'mm_vision_tower'`.
It seems like the llava…
-
I skimmed through the paper and it mentions that Vision transformer (the very firs block) takes in images of size 256*256. But the transforms in Reame resize it to 224x224. What am I missing here?
…
-
### Your GTNH Discord Username
Dart_Voider
### Your Pack Version
2.6.1
### Your Proposal
Add for Active Transformer instant structure check when controller placed
### Your Goal
It will make tra…
-
The 'Fine-tune the Vision Transformer on CIFAR-10 with PyTorch Lightning' tutorial notebook uses a learning rate of lr=5e-5 but should be changed to 2e-5 at the most but probably more like 1e-5. The …
-
请问有,有没有试过vision transformer类的模型量化?DeiT-S,swin等模型量化精度会有下降。
-
When I run
`bash scripts/video/demo/video_demo.sh ${the path of LLaVA-NeXT-Video-7B-DPO} vicuna_v1 32 2 True ${the path of video}`
I get the error
```
Can't set vocab_size with value 32000 for …
-
I was trying to find the implementation of where the patches are being created. What I understand according to paper is that when there are multiple images, complete images should be used instead of c…