-
llama 3.2 vision is a good work!
I am doing some interesting work based on llama 3.2 vision. I have read paper about llama 3.2 vision, but I have a very important question to ask.
Below is a image…
-
Hi,
Thanks for the great work. Is there any example how this could be used with standard (Pytorch) Vision Transformers?
Many thanks,
Sid
-
I’m trying to fine-tune Phi 3.5 Vision using transformers. However, I’m running into an issue trying to save the model during or after training. See below for a minimal reproducible example.
My examp…
-
Hello,
Could you provide the vision transformer backbone used for the model?
I am using dino's vision_transformer.py code for a vit -giant (https://github.com/facebookresearch/dino/blob/main/visi…
-
### Feature request
This request proposes one of three changes (see **Motivation** for background, and **Your contribution** more thoughts on possible solutions) in order to allow saving of a certa…
-
### 🚀 The feature
Implement CrossVIT model for Fine grained classification
### Motivation, pitch
CrossViT integrates multi-scale feature representations, enabling it to efficiently process images o…
-
### Issue Type
Documentation Bug
### Source
source
### Keras Version
2.14
### Custom Code
Yes
### OS Platform and Distribution
Ubuntu 22.04
### Python version
3.10
…
-
Hello, I am very interested in your work, but why can't I find your paper:Pre-training Vision Transformers for Visual Times Series Forecasting
-
I run the program in pycharm, one error listed below occurs, how to solve it?
ValueError: Unrecognized model in weights/icon_caption_florence. Should have a `model_type` key in its config.json, or co…
-
推理结果异常图片:
![ddc921571d121084892ded80d3c6b573](https://github.com/user-attachments/assets/a8cbf24e-bbfb-494e-8ecc-d31136e4e4f0)
cmake -DCMAKE_SYSTEM_NAME=Linux \
-DMNN_BUILD_DIFFUSION=ON -DM…