-
There's an error when uploading an image to chat while running `python3 -m mlx_vlm.chat_ui --model mlx-community/SmolVLM-Instruct-4bit`
Error:
chat_ui.py", line 32, in chat
if len(message.file…
-
I want to compare the performance differences between VLM-vec, MM-Embed, and UniIR on retrieval task.
I just find that data for the retrieval task is the same in both MM-Embed and M-BEIR
-
After I install it, I tried running this demo command. But I get errors:
```
# Demo
from vlmeval.config import supported_VLM
model = supported_VLM['idefics_9b_instruct']()
# Forward Single Imag…
-
# [24’ ICML] Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models - Blog by rubatoyeong
Find Directions
[https://rubato-yeong.github.io/multimodal/prism/](https://…
-
- [x] MiniCPM-Llama3-V-2_5
- [x] Florence 2
- [x] Phi-3-vision
- [x] Bunny
- [x] Dolphi-vision-72b
- [x] Llava Next
- [x] Qwen2-VL
- [x] Pixtral
- [x] Llama-3.2
- [x] Llava Interleave
- [x] …
-
This is a nice framework to use for image analysis / captioning, etc.
Is there a doc somewhere that sets out which models, specifically can be driven through this app/library? When you say "Pixtra…
-
Hi, I have troubles running the slam/r3d_stream_rerun_realtime_mapping.py file. I'm using the code from ali-dev branch and I've modified the DemoApp structure so the input depth and rgb images are ob…
-
Update VisualQnA example that uses Falcon VLM.
This would require to include Falcon as part of the validation at https://github.com/opea-project/GenAIComps/tree/main/comps/llms. And then create an …
-
Could you add our CVPR 2024 paper about vision-language pertaining, "Iterated Learning Improves Compositionality in Large Vision-Language Models", into this repo?
Paper link: https://arxiv.org/abs/…
-
### Feature request
Extend the `sft_vlm.py` script to support the new Molmo models from AllenAI: https://huggingface.co/collections/allenai/molmo-66f379e6fe3b8ef090a8ca19
Paper: https://arxiv.org/…