-
Hi, I was reproducing the performance of Video-LLaVA-7B (https://huggingface.co/LanguageBind/Video-LLaVA-7B) on the video-mme benchmark. I found that the performance is very poor when I use the video …
-
just tried this on Windows 10 Pro x64.
Adding image and asking about the image doesn't work.
"I can't see the image you're referring to. I'm a large language model, I don't have the capability to vi…
-
Dear Yuhui Zhang,
Thank you for your great effort! I found your paper very interesting and informative.
Could you provide us with the pretrained weights for fine-tuned VLMs? It would be a tremen…
-
### Motivation
It outperforms existing open-source models like Intern-VL-1.5
https://llava-vl.github.io/blog/2024-05-10-llava-next-stronger-llms/#:~:text=Live%20Demo-,Benchmark%20Results,-Results…
-
**Continued from: https://github.com/Blaizzy/fastmlx/issues/6**
---
When I tried this at the command line: "python -m mlx_vlm.chat_ui --model mlx-community/llava-1.5-7b-4bit", I get the same cha…
-
Thank you for your work. I am very interested in the self-attention maps shown in the paper, but I don't know how to generate them. Could you provide the code to generate the attention maps and indica…
-
### Question
Hi,
Thanks for the great work! I am doing research in relevant domains and I had some questions about LLaVA 1.6's training details, in particular:
(1) the blog says "It supports th…
-
What are the principles behind the selection of baseline models? The ShareCaptioner-Video model is trained using the IXC2-4KHD dataset, while the ShareGPT4Video-8B model is trained on the LLaVA-Next-8…
-
### System Info
2.0.4 docker image
### Information
- [x] Docker
- [ ] The CLI directly
### Tasks
- [X] An officially supported command
- [ ] My own modifications
### Reproduction
https://github…
-