Closed JianbangZ closed 11 months ago
Our quantization script is aligned with llama.cpp, you can directly use it by ./build/bin/quantize --pure $PATH_TO_ORIGIN_ MODEL $Q4_ MODEL_ NAME Q4_0
Thanks for your feedback! We have added model quantization under README: https://github.com/SJTU-IPADS/PowerInfer#quantization.
How are the gguf weights quantized to INT4? is there a script similar to llama.cpp to convert to fp16 weigths to q4_0? Please share more details about INT4 model.