ggerganov / llama.cpp

LLM inference in C/C++
MIT License
67.53k stars 9.7k forks source link

GGML_ASSERT: ggml-quants.c:11615: besti1 >= 0 && besti2 >= 0 && best_shift != 0 #6067

Closed schmorp closed 6 months ago

schmorp commented 8 months ago

When quantizing https://huggingface.co/Undi95/Plap-8x13B with this imatrix http://data.plan9.de/Plap-8x13B.imatrix quantize crashes with many messages as in the title (probably one per thread).

quantize can quantize all the Q quants and IQ2_M successfully, then crashes with this on IQ1_S.

This is with b2409. It differs from the nan-issue I reported earlier in that no nans were output during imatrix generation, and it seems to be IQ1_S specific.

schmorp commented 7 months ago

A similar model fails with another assertion in IQ3_XXS after sucessfully quantizing IQ2_M:

[ 1/1925] blk.0.ffn_up.2.weight - [ 5120, 13824, 1, 1], type = f16, converting to iq3_xxs .. Oops: found point 104 not on grid: 104 0 0 0 GGML_ASSERT: ggml-quants.c:11118: false

Might or might not be a similar issue.

Model: https://huggingface.co/Undi95/BigPlap-8x20B imatrix file: https://huggingface.co/mradermacher/BigPlap-8x20B-i1-GGUF/blob/main/imatrix.dat

github-actions[bot] commented 6 months ago

This issue was closed because it has been inactive for 14 days since being marked as stale.