Open JianbangZ opened 1 year ago
The paper: https://arxiv.org/pdf/2306.03078.pdf
The code: https://github.com/Vahe1994/SpQR
Given this comment: https://github.com/ggerganov/llama.cpp/issues/1602#issuecomment-1597142154, it seems unlikely SpQR is going to be implemented any time soon:
The main idea of the SpQR paper is to separate "outliers". This has been tried as part of k-quants development and has been shown to be less effective, see for instance https://github.com/ggerganov/llama.cpp/discussions/1595#discussioncomment-6018205 in https://github.com/ggerganov/llama.cpp/discussions/1595).
If we read the SpQR paper more carefully, we find that what they mean by "nearly lossless compression" is to arrive at a quantized perplexity within 1% of the full model. The Q4_K_M variant of k-quants does that for ggml, see for instance PR https://github.com/ggerganov/llama.cpp/pull/1684
We can probably close this issue.
How feasible to implement spQR into ggml? SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression