Open smpurkis opened 1 month ago
@smpurkis thanks for the reference. Taking a look, this is on the radar.
@EricLBuehler I looked through the code and saw Candle is used for quantized tensors, so I've started looking/work on adding the datatype to Candle. https://github.com/huggingface/candle/issues/2605
Could do with some guidance if that is the right place to add it?
Hello,
llama.cpp recently added support for an AArch64 specific type of GGUF and AArch64 specific matmul kernels. Here is the merged PR https://github.com/ggerganov/llama.cpp/pull/5780#pullrequestreview-21657544660
Namely Q4_0_8_8, Q4_0_4_8 and more generic Q4_0_4_4 GGUF model formats.