Closed DenisSergeevitch closed 6 months ago
I'm trying to launch already quantized models that are working with llama.cpp, but it does not work with SiLLM.
Am I missing something, and SiLLM only works with FP16 models?
Models tried:
WizardLM-2-8x22B.Q2_K-00001-of-00005.gguf ggml-c4ai-command-r-plus-104b-iq2_m.gguf phi-2-orange-v2.Q8_0.gguf
GGUF support in SiLLM via MLX is currently limited to quantizations Q4_0, Q4_1 and Q8_0.
You can check the readme for a list of GGUF models that have been tested and should work.
I'm trying to launch already quantized models that are working with llama.cpp, but it does not work with SiLLM.
Am I missing something, and SiLLM only works with FP16 models?
Models tried: