armbues / SiLLM

SiLLM simplifies the process of training and running Large Language Models (LLMs) on Apple Silicon by leveraging the MLX framework.
MIT License
221 stars 21 forks source link

[load_gguf] gguf_tensor_to_f16 failed #2

Closed DenisSergeevitch closed 6 months ago

DenisSergeevitch commented 6 months ago

I'm trying to launch already quantized models that are working with llama.cpp, but it does not work with SiLLM.

Am I missing something, and SiLLM only works with FP16 models?

Models tried:

WizardLM-2-8x22B.Q2_K-00001-of-00005.gguf
ggml-c4ai-command-r-plus-104b-iq2_m.gguf
phi-2-orange-v2.Q8_0.gguf
armbues commented 6 months ago

GGUF support in SiLLM via MLX is currently limited to quantizations Q4_0, Q4_1 and Q8_0.

You can check the readme for a list of GGUF models that have been tested and should work.