Closed ykhorzon closed 11 months ago
I don't think I have the bandwidth to integrate it with other platforms at this moment. If you have any experience quantizing models, could you please share the steps / scripts doing so? I am willing to do it if it doesn't take much time. Also please feel free to contribute :)
Audrey T already convert it for us. https://huggingface.co/audreyt/Taiwan-LLaMa-v1.0-GGML
@ykhorzon Take llama.cpp for example: https://github.com/ggerganov/llama.cpp#prepare-data--run
./quantize ./models/Taiwan-LLaMa-v0.0 ./models/7B//Taiwan-LLaMa-v0.0-ggml-q4_0.bin q4_0
Note that, You should point the first argument to the directory, and quantize program will find the models.
I would like to know that is there any plan to convert float16 model quantized model (int8, int4) and deploy with llama.cpp?