-
Hi~,I am recently trying to use the llava_onevision model, I try to follow the onevision tutorial, which seems pretty easy. I run the program exactly as the tutorial, the model is 0.5b_si. However, a …
-
First of all, in case anybody sees this thread and thinks "Oh, I want to use Long-CLIP with Flux!": I made a ComfyUI custom node for it, and you can find it here: [https://github.com/zer0int/ComfyUI-L…
-
Hello everyone,
Thanks for the source code !
Is it possible to adapt V-JEPA to 3D imaging (like medical imaging)?
I don't know which aproach would work best between converting imaging to pseudo-s…
-
I am facing this error while trying to fine-tune the phi3.5 vision model with lora.
created the virtual env based on the environment.yaml file,
all the library version are as mentioned.
```
[r…
-
Hello,
I am using Vit base model creation with patch size 16, but getting an error when loading checkpoint:
Here is the vision transformer class I am using: https://github.com/facebookresearch/din…
-
# New Operator
Self Attention
### Describe the operator
Multi-headed attention is seeing prolific use in all transformers (mostly described in [pytorch](https://pytorch.org/docs/stable/generated/t…
dfiru updated
3 weeks ago
-
## Description
I attempted to compile a Hugging Face model (the Hugging Face model link is: https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-5, which includes both the model architecture code …
-
I encountered an issue when trying to use Vision Transformer based models like _vit_base_, _vit_swin_large_, etc. in the PatchCore implementation. I tried to execute this on the Kaggle Notebook enviro…
-
I am trying to apply SmoothQuant during W8A8 quantization of `meta-llama/Llama-3.2-11B-Vision-Instruct` where I ignore all of the modules except for language_model. However I find that it crashes when…
-
### Links
- Paper : https://arxiv.org/abs/2107.04589
- Github : -
### 한 줄 요약
- ViT를 GAN에 적용해 CNN-based GAN에 견줄만한 성능을 낸다.
### 선택 이유
- CNN-based GAN이 아닌 ViT를 GAN에 처음 적용한 논문이라 흥미로워서 읽어보았다.
- 또…