Closed DomainFlag closed 6 months ago
Started here #1147 but got sidetracked.
Would love to see full support for LLaVA 1.6 in this project.
Started here #1147 but got sidetracked.
Definitely, there will be new, ground-breaking LLaVA models coming this month, fine-tuned on Llama-3. It would be great to run them quantized in GGUF using this cpp-python library.
Coming soon in #1147 , already added llava1.6, obsidian, and moondream support using the new system.
we appreciate the addition of LLaVaV1.6 34B support would be great to have a support for smaller 7B quants and projectors, or at least a single cjpais/llava-1.6-mistral-7b-gguf. that would be truly awesome!
@Vinventive I wasn't aware there were differences in the chat formats, do you mind sharing a link and I'll add that right away, cheers!
@Vinventive I wasn't aware there were differences in the chat formats, do you mind sharing a link and I'll add that right away, cheers!
Here is the link: https://github.com/ggerganov/llama.cpp/pull/5267
For Mistral and using llava-cli binary: Add this: -p "
\nUSER:\nProvide a full description.\nASSISTANT:\n" The mistral template for llava-1.6 seems to be no system print and a USER/ASSISTANT role
really struggling to run LLaVA with CUDA instead of cuBLAS and I was wondering if it's just an isolated issue, I've seen other open issue where people are running into similar problems https://github.com/abetlen/llama-cpp-python/issues/1393
maybe we're doing something incorrectly, or there is a missing info/step in the readme how to run it on Windows 64-bit?
I'm using the
Llava15ChatHandler
but it seems I don't see anything forLlava16ChatHandler
by looking at the source code? Moreover, it contains hard-coded templating instead of having support for custom in-model given the prop-value from the metadatatokenizer.chat_template
by Nous Hermes 2 Yi 34B for example (Link) which is quite different from the hard-coded one? Any plans for that? Is LLava 1.6 really supported or should I fallback to the parent project?Update: Seems using the current codebase state, I can get fairly okay results during inference but not sure if there might be some regression, need to check the original and compare.