-
### Your question
I am getting:
**clip missing: ['text_projection.weight']**
In a workflow that has flux and sdxl a the same time:
![image](https://github.com/user-attachments/assets/eadfcba4-f…
-
Hi, authors. Thanks for your great work and your idea is interesting.
I have some questions about results in Table 5.
The table show that SigLIP is inferior to the CLIP in all aspects.
Differently…
-
Hi,
I was wondering if you are planning to release the weights for the VGAMT models you trained?
Thank you!
-
```bash
python3 sample.py --image img.png --prompt "hi"
```
-
# ComfyUI Error Report
## Error Details
- **Node Type:** Joy_caption_two_load
- **Exception Type:** AssertionError
- **Exception Message:**
## Stack Trace
```
File "H:\ComfyUI-qiuye\ComfyU…
-
Hi, thank you for the wonderful paper and codebase! I had one clarification question: it looks like there is an extra set of forward passes for the SigLIP ViT blocks - is this intentional for the sigl…
-
### Feature request
It would be nice if when setting `torch_dtype` to `auto` when calling `from_pretrained`, it properly respects _nested_ `torch_dtype`s specified in the model's config. Right now i…
-
### Feature request
We want to standardize the logic flow through Processor classes. Since processors can have different kwargs depending on the model and modality, we are adding a `TypedDict` fo…
-
Dear authors!
In the paper, you mentioned that the final checkpoint is for fine-tuned pre-trained Prismatic VLM. But there is also a config for Vicuña v1.5 7B + SigLIP ViT based training in the code…
-
### The model to consider.
https://huggingface.co/lmms-lab/llava-onevision-qwen2-7b-ov
There are a bunch of others using the same architecture.
### The closest model vllm already supports.
qwen2…