OpenGVLab / EfficientQAT

EfficientQAT: Efficient Quantization-Aware Training for Large Language Models
224 stars 17 forks source link

GGUF #3

Open maxim-saplin opened 3 months ago

maxim-saplin commented 3 months ago

Any chance 2bit models can be used with llama.cpp? Would be great to get LLama 3.1 (8B and 70B) converted to GGUF to try them out locally.

Thanks for the great research work!

ChenMnZ commented 3 months ago

Hi, thanks for your interesting for our work.

Unfortunately, I found that llama.cpp don't support for GPTQ format quantization type now (see https://github.com/ggerganov/llama.cpp/issues/4165 for details).

Therefore, it is not an easy things to converte our 2-bit model into GGUF.

kaleid-liner commented 3 months ago

Any chance 2bit models can be used with llama.cpp? Would be great to get LLama 3.1 (8B and 70B) converted to GGUF to try them out locally.

Thanks for the great research work!

T-MAC has supported GPTQ format through llama.cpp GGUF integrated with its own highly optimized kernels, and already tested with Llama-3-8b-instruct-w4-g128/Llama-3-8b-instruct-w2-g128 from EfficientQAT. You can try it.

ChenMnZ commented 3 months ago

Thanks for your reminder, I will give a try.

ChenMnZ commented 3 months ago

@kaleid-liner Does T-MAC support w2g64. I have uploaded a w2g64 Mistral-Large-Instruct to huggingface, which is hot on Reddit.

I think it would be interesting if T-MAC also support for w2g64.

kaleid-liner commented 3 months ago

Sure. T-MAC supports any group size by setting --group_size. But I'm not sure if the convert script supports Mistral. I need to test it.

brownplayer commented 2 months ago

确定。T-MAC通过设置来支持任何组大小。但我不确定转换脚本是否支持 Mistral。我需要测试它。--group_size

hi, how is the test going? Does it support mistral

kaleid-liner commented 2 months ago

@ChenMnZ @brownplayer Sure. It supports Mistral.

brownplayer commented 2 months ago

Ok, thank you for your reply. May I ask what command is used to wake up the first time the model is downloaded? I'm using a GPTQ format model

---- Replied Message ---- | From | @.> | | Date | 08/20/2024 19:18 | | To | OpenGVLab/EfficientQAT @.> | | Cc | brownplayer @.>, Mention @.> | | Subject | Re: [OpenGVLab/EfficientQAT] GGUF (Issue #3) |

@@.*** Sure. It supports Mistral.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>