-
apologize for the questions about your another significant work .
really appreciate your work AffordanceLLM: Grounding Affordance from Vision Language Models and this 3DOI about the breaking contrib…
-
### Have I written custom code (as opposed to using a stock example script provided in MediaPipe)
None
### OS Platform and Distribution
window10
### MediaPipe Tasks SDK version
_No response_
###…
-
## ⚙️ Request New Models
- Link to an existing implementation (e.g. Hugging Face/Github): https://huggingface.co/microsoft/Phi-3-vision-128k-instruct
- Is this model architecture supported by MLC…
-
**Details of model being requested**
- Model name: Florence-2
- Source repo link: https://huggingface.co/collections/microsoft/florence-6669f44df0d87d9c3bfb76de
- Research paper link: https://arxiv…
-
An Introduction to Vision-Language Modeling
https://arxiv.org/abs/2405.17247
-
atb29 updated
3 weeks ago
-
The performance of your work is very impressive!
In your paper, you said UVLTrack was trained on GOT-10k, COCO2017, TrackingNet and other datasets.
But other Vision-Language Tracker like JointNLT …
-
Lora+base is working good
![image](https://github.com/mbzuai-oryx/LLaVA-pp/assets/15274284/ccec0900-7db0-4729-9ab4-3c5f68e0f304)
![image](https://github.com/mbzuai-oryx/LLaVA-pp/assets/15274284/7d12…
-
### Describe the documentation issue
All of the samples appear to be in Python. Nothing in C, or C#.
Python is awesome for academia, but most Windows desktop App developers are developing desktop…
-
Hi! Thank you again for this repo. The fine-tuning with llama3 works. However, when I try to merge with the obtained LoRA weights, using the `merge_lora_weights.py` script, and I compare the weights b…