-
Hi it looks like you updated the API a little bit in this commit
https://github.com/guinmoon/llmfarm_core.swift/commit/e4e8aa7617e2e86af434677cc4196462a0005ea9
Would you mind giving an updated w…
-
### Brief Description
Obviously the end-game here are multimodal LLMs instead of using a cascaded approach. But we are not quite there yet.
There are however interesting options that are multimoda…
-
### Question Validation
- [X] I have searched both the documentation and discord for an answer.
### Question
I am Referring to this example: https://www.llamaindex.ai/blog/multimodal-rag-for-advanc…
-
Do you have any plans to support multimodal LLMs, such as MiniGPT-4/MiniGPT v2 (https://github.com/Vision-CAIR/MiniGPT-4/) and LLaVA (https://github.com/haotian-liu/LLaVA/)? That would be a significan…
-
### Feature request
Is it possible to rum multimodal LLMs like Qwen VL or LLaVa 1.5 using openllm?
### Motivation
_No response_
### Other
_No response_
-
In articles with images, I do not want LLM to recognize the images within, or I want a LLM without multimodal capabilities to read the pure text of the notes directly without triggering the vision mod…
-
For multimodal models, we usually need to combine visual features and input_embeds as final input_embeds and send them to the model for inference.
Currently, this combination method may be different …
-
### Feature request
Adding the ability to pass many images per prompt to PaliGemma. This would mean, among other changes, to change the argument type of `images` on PaliGemmaProcessor to allow array[…
-
The following article might also be a great read related to the topic of whether LLMs understand tabular data: ["Tables as Images? Exploring the Strengths and Limitations of LLMs on Multimodal Represe…
-
I am using the llava-onevision model (https://llava-vl.github.io/blog/2024-08-05-llava-onevision/), which can accept two images as input and then ask questions about the two images. Does the current …