-
https://medium.com/ubiai-nlp/how-to-fine-tune-llava-on-your-custom-dataset-aca118a90bc3
LLaVA exemplifies the synergy achieved through the convergence of language and vision. At its essence, LLaVA em…
-
*Sent by Google Scholar Alerts (scholaralerts-noreply@google.com). Created by [fire](https://fire.fundersclub.com/).*
---
###
###
### [PDF] [Attention Prompting on Image for Large Vision-Language…
-
### Motivation
Recently,there are many good paper that try to alleviating hallucinations for large vision-language models **during the decode process**,like:
OPERA: Alleviating Hallucination in Mu…
zhly0 updated
2 weeks ago
-
[Qwen2Audio huggingface docs](https://huggingface.co/docs/transformers/main/en/model_doc/qwen2_audio)
I see there's been a couple requests for vision-language model support like LLaVa:
https:…
-
## Value Statement
As someone who wants a boring way to use AI
I would like to expose an image/PDF/document to the LLM
So that I can make requests and extract information, all within Ramalama
…
-
Hi friends!
I'd like to share our recent project embodied-agents: https://github.com/mbodiai/embodied-agents, which makes it easy to integrate large multi-modal models into existing robot stacks wi…
-
### Feature request / 功能建议
This feature request proposes adding support for Meta's newly released Llama 3.2 models to lmdeploy. Llama 3.2 introduces exciting capabilities, including vision LLMs (11…
-
### Objective:
Develop a robotic control system using embodied chain-of-thought reasoning (ECoT) to enable robots to think, perceive, and act more effectively. By integrating Vision-Language-Action (V…
-
### System Info
### what i want
So I want a solution that can quickly generate AI output by efficiently using precompute kv caches of text and images of all previous prompts!
### by using…
-
**Details of model being requested**
- Model name: Florence-2
- Source repo link: https://huggingface.co/collections/microsoft/florence-6669f44df0d87d9c3bfb76de
- Research paper link: https://arxiv…