Running quantize with a target dtype of F32, F16, or Q8_0 can result in a Q6_K output tensor without --pure (ref https://github.com/ggerganov/llama.cpp/pull/5631#issuecomment-1965055798). This is surprising, as I would expect converting to F32 and then quantizing to F16 to produce similar results to converting directly to F16.
I suggest that the k-quant mixture logic should never attempt to decrease the quality of the output tensor, only increase it.
Running quantize with a target dtype of F32, F16, or Q8_0 can result in a Q6_K output tensor without --pure (ref https://github.com/ggerganov/llama.cpp/pull/5631#issuecomment-1965055798). This is surprising, as I would expect converting to F32 and then quantizing to F16 to produce similar results to converting directly to F16.
I suggest that the k-quant mixture logic should never attempt to decrease the quality of the output tensor, only increase it.