Support quantized model (int8, int4) and deployment?

MiuLab / Taiwan-LLM

Traditional Mandarin LLMs for Taiwan

https://twllm.com

Apache License 2.0

1.23k stars 102 forks source link

Support quantized model (int8, int4) and deployment? #13

Closed ykhorzon closed 11 months ago

ykhorzon commented 1 year ago

I would like to know that is there any plan to convert float16 model quantized model (int8, int4) and deploy with llama.cpp?

adamlin120 commented 1 year ago

I don't think I have the bandwidth to integrate it with other platforms at this moment. If you have any experience quantizing models, could you please share the steps / scripts doing so? I am willing to do it if it doesn't take much time. Also please feel free to contribute :)

ykhorzon commented 1 year ago

Audrey T already convert it for us. https://huggingface.co/audreyt/Taiwan-LLaMa-v1.0-GGML

PichuChen commented 1 year ago

@ykhorzon Take llama.cpp for example: https://github.com/ggerganov/llama.cpp#prepare-data--run

./quantize ./models/Taiwan-LLaMa-v0.0 ./models/7B//Taiwan-LLaMa-v0.0-ggml-q4_0.bin q4_0

Note that, You should point the first argument to the directory, and quantize program will find the models.