Closed lg123666 closed 7 months ago
I suggest grabbing the LLava model from HF: https://huggingface.co/liuhaotian/llava-v1.5-7b/tree/main And doing this: https://www.secondstate.io/articles/convert-pytorch-to-gguf/ https://github.com/ggerganov/llama.cpp/discussions/2948
I recall doing this two commands for another project, this converts the hugging face model to gguf and then you can quantize it:
python llama.cpp/convert.py .\Lince-Mistral\ --outfile lince-mistral.gguf
.\llamacppbinaries\quantize.exe .\lince-mistral.gguf 15
Thanks for reporting this. I'm in the process of fixing this now with the latest upstream sync. Once this issue is closed, you'll be able to build llava-quantize
by running make
on this repo at head. I'll be publishing a new release shortly afterward.
I'm looking to quantize the llava model from fp16.gguf. When I try to quantize llava after compiled llamafile,
app/bin/llava-quantize llava-v1.5-7B-GGUF/llava-v1.5-7b-mmproj-f16.gguf llava-v1.5-7B-GGUF/llava-v1.5-7b-mmproj-q4_0_test.gguf 7
some error occurredllamafile/metal.c:271: assert(FLAG_gpu != LLAMAFILE_GPU_ERROR) failed (cosmoaddr2line app/bin/llava-quantize 45003c 53533d 4500b5 499e19 438e23 43c0e0 401983 401e23 401604)
Could someone provide guidance or steps on how to achive this?