Open maxim-saplin opened 3 months ago
Hi, thanks for your interesting for our work.
Unfortunately, I found that llama.cpp don't support for GPTQ format quantization type now (see https://github.com/ggerganov/llama.cpp/issues/4165 for details).
Therefore, it is not an easy things to converte our 2-bit model into GGUF.
Any chance 2bit models can be used with llama.cpp? Would be great to get LLama 3.1 (8B and 70B) converted to GGUF to try them out locally.
Thanks for the great research work!
T-MAC has supported GPTQ format through llama.cpp GGUF integrated with its own highly optimized kernels, and already tested with Llama-3-8b-instruct-w4-g128/Llama-3-8b-instruct-w2-g128 from EfficientQAT. You can try it.
Thanks for your reminder, I will give a try.
@kaleid-liner Does T-MAC support w2g64. I have uploaded a w2g64 Mistral-Large-Instruct to huggingface, which is hot on Reddit.
I think it would be interesting if T-MAC also support for w2g64.
Sure. T-MAC supports any group size by setting --group_size
. But I'm not sure if the convert script supports Mistral. I need to test it.
确定。T-MAC通过设置来支持任何组大小。但我不确定转换脚本是否支持 Mistral。我需要测试它。
--group_size
hi, how is the test going? Does it support mistral
@ChenMnZ @brownplayer Sure. It supports Mistral.
Ok, thank you for your reply. May I ask what command is used to wake up the first time the model is downloaded? I'm using a GPTQ format model
---- Replied Message ---- | From | @.> | | Date | 08/20/2024 19:18 | | To | OpenGVLab/EfficientQAT @.> | | Cc | brownplayer @.>, Mention @.> | | Subject | Re: [OpenGVLab/EfficientQAT] GGUF (Issue #3) |
@@.*** Sure. It supports Mistral.
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>
Any chance 2bit models can be used with llama.cpp? Would be great to get LLama 3.1 (8B and 70B) converted to GGUF to try them out locally.
Thanks for the great research work!