abetlen / llama-cpp-python

Python bindings for llama.cpp
https://llama-cpp-python.readthedocs.io
MIT License
8.05k stars 958 forks source link

Missing LLava 1.6 support for handling custom templates with the respect of the chosen LLM. #1301

Closed DomainFlag closed 6 months ago

DomainFlag commented 7 months ago

I'm using the Llava15ChatHandler but it seems I don't see anything for Llava16ChatHandler by looking at the source code? Moreover, it contains hard-coded templating instead of having support for custom in-model given the prop-value from the metadata tokenizer.chat_template by Nous Hermes 2 Yi 34B for example (Link) which is quite different from the hard-coded one? Any plans for that? Is LLava 1.6 really supported or should I fallback to the parent project?

Update: Seems using the current codebase state, I can get fairly okay results during inference but not sure if there might be some regression, need to check the original and compare.

abetlen commented 7 months ago

Started here #1147 but got sidetracked.

shelbywhite commented 6 months ago

Would love to see full support for LLaVA 1.6 in this project.

Vinventive commented 6 months ago

Started here #1147 but got sidetracked.

Definitely, there will be new, ground-breaking LLaVA models coming this month, fine-tuned on Llama-3. It would be great to run them quantized in GGUF using this cpp-python library.

abetlen commented 6 months ago

Coming soon in #1147 , already added llava1.6, obsidian, and moondream support using the new system.

Vinventive commented 6 months ago

we appreciate the addition of LLaVaV1.6 34B support would be great to have a support for smaller 7B quants and projectors, or at least a single cjpais/llava-1.6-mistral-7b-gguf. that would be truly awesome!

abetlen commented 6 months ago

@Vinventive I wasn't aware there were differences in the chat formats, do you mind sharing a link and I'll add that right away, cheers!

Vinventive commented 6 months ago

@Vinventive I wasn't aware there were differences in the chat formats, do you mind sharing a link and I'll add that right away, cheers!

Here is the link: https://github.com/ggerganov/llama.cpp/pull/5267

For Mistral and using llava-cli binary: Add this: -p "\nUSER:\nProvide a full description.\nASSISTANT:\n" The mistral template for llava-1.6 seems to be no system print and a USER/ASSISTANT role

Vinventive commented 6 months ago

really struggling to run LLaVA with CUDA instead of cuBLAS and I was wondering if it's just an isolated issue, I've seen other open issue where people are running into similar problems https://github.com/abetlen/llama-cpp-python/issues/1393

maybe we're doing something incorrectly, or there is a missing info/step in the readme how to run it on Windows 64-bit?