[load_gguf] gguf_tensor_to_f16 failed

armbues / SiLLM

SiLLM simplifies the process of training and running Large Language Models (LLMs) on Apple Silicon by leveraging the MLX framework.

MIT License

221 stars 21 forks source link

Closed DenisSergeevitch closed 6 months ago

DenisSergeevitch commented 6 months ago

I'm trying to launch already quantized models that are working with llama.cpp, but it does not work with SiLLM.

Am I missing something, and SiLLM only works with FP16 models?

Models tried:

WizardLM-2-8x22B.Q2_K-00001-of-00005.gguf
ggml-c4ai-command-r-plus-104b-iq2_m.gguf
phi-2-orange-v2.Q8_0.gguf

armbues commented 6 months ago

GGUF support in SiLLM via MLX is currently limited to quantizations Q4_0, Q4_1 and Q8_0.

You can check the readme for a list of GGUF models that have been tested and should work.