-
CUDA_VISIBLE_DEVICES=0 python3 inference.py --model-path ./PCIResearch/TransCore-M --vision-path ./openai/clip-vit-large-patch14-336
You are using a model of type transcorem to instantiate a model of…
-
### System Info
- `transformers` version: 4.46.0
- Platform: Linux-5.15.0-97-generic-x86_64-with-glibc2.35
- Python version: 3.12.3
- Huggingface_hub version: 0.26.1
- Safetensors version: 0.4.…
-
## installable
- [ ] https://github.com/salesforce/LAVIS
- https://github.com/salesforce/BLIP
- https://github.com/salesforce/ALBEF
- [ ] https://github.com/facebookresearch/multimodal
- …
-
### Model description
LaVIN is a vision-language instructed model that is affordable to train (it was trained in a few hours on 8 A100 GPUs) with good performance on ScienceQA.
I'd like to add …
-
之前折腾过一次,模型就下了半天时间。总共花了一天时间都没搞定,各种报错。
今天突然想起来又折腾了一下午
所有报错解决了,但这加载模型都用了半个小时,而且没进webui显存就被占用完了,最高直接用到30G显存。
眼看就要看到光明了,结果报错了,显存不足。。。
这个项目是不是不适合win系统?
win用户准备尝试的,建议慎重考虑,浪费时间不说,最后可能一场空。
后期作者有对win做优化…
-
When I run pretrain scripts,
I got this:
File "/data/lc/Multi-image/multi_token/multi_token/language_models/mistral.py", line 85, in forward
) = self.prepare_inputs_labels_for_multimodal(
F…
-
- [ ] [Title: "Yi Model Family: Powerful Multi-Dimensional Language and Multimodal Models"](https://arxiv.org/html/2403.04652v1)
# Title: "Yi Model Family: Powerful Multi-Dimensional Language and Mul…
-
## Project Request
The goal of this project is to develop an AI-powered system that leverages natural language processing (NLP) and computer vision techniques to generate and manipulate text and im…
-
Is there a way to train novel concepts into your blip model, like the way that textual inversions work for stable diffusion image generation? If so is there a training script provided or would one nee…
-
In autoawq, do we only quantize the LLM part of Llava or do we also quantize the ViT ? Can we add support for quantizing the vision models like ViT or SIGLIP?