-
**Project description**
Nexa SDK is a comprehensive toolkit for supporting ONNX and GGML models. It supports text generation, image generation, vision-language models (VLM), auto-speech-recognition…
-
After resolving my LLM-as-assistant issue, I am now having issues using the LLM-vision. I have the models suggested on the github, as seen below, but every single one of them returns [ERROR] Model doe…
-
### Feature request
Enable PPOTrainer and DPOTrainer to work with audio-language models like Qwen2Audio. Architecture for this model is identical to vision-language models like LlaVa, consisting of…
-
# Paper Review: Unveiling Encoder-Free Vision-Language Models – Andrey Lukyanenko
My review of the paper Unveiling Encoder-Free Vision-Language Models
[https://andlukyane.com/blog/paper-review-eve](…
-
Hello, can you please open source the code of Unified Visual Relationship Detection with Vision and Language Models? I am very interested in your training method for non-relationship data.
-
### Motivation
Our business model (Internvl 2-26B) outputs very few tokens (1-2 tokens) after prompt optimization, which can be considered as only the prefill stage. Therefore, we hope to use W8A8 qu…
-
### Self Checks
- [X] This is only for bug report, if you would like to ask a question, please head to [Discussions](https://github.com/langgenius/dify/discussions/categories/general).
- [X] I have s…
-
Instead of relying on `HF`, can we link to the `gguf` file? I tried converting the code using claude, but its failing. Any chance you can give some pointers to use a local file than going via HF?
-
1. Read up on GPT-4o vision accuracy, precision, and recall benchmarks for ceiling baselines. Document research papers on our paper.
-
Hi friends!
I'd like to share our recent project embodied-agents: https://github.com/mbodiai/embodied-agents, which makes it easy to integrate large multi-modal models into existing robot stacks wi…