Open 0wwafa opened 5 days ago
I have noticed that convert does not produce a "pure" f16.
Do you mean that some tensors are in F32
in the resulting gguf
model? These are usually 1D tensors which are very small anyway. (BTW, even llama-quantize --pure ...
keeps 1D tensors as F32
)
Some of the ggml
operators used on 1D tensors (currently) only work on F32
tensors (e.g. ggml_norm
), so a pure f16 gguf
model would not work without modifications in ggml.c
.
Is there a particular reason why you'd like extremely "pure" conversions?
Is there a particular reason why you'd like extremely "pure" conversions?
well. no.. I mean I wanted to make comparisons between f16 "pure" and my own quants (which are a mix of f16 and q5 or q6). They seem to be smaller at no cost.. almost no degradation. You can find those quants in my huggingface profile page under models: https://huggingface.co/ZeroWw
Hello, usually when quantizing I first convert a huggingface model to F16 gguf then I quantize that to my quantizations. I have noticed that convert does not produce a "pure" f16. I think there should be a flag as in the quantize program to allow a pure f16 (all tensors) or pure bf16 conversion.