Open LysandreJik opened 2 months ago
Very cool! I am glad that you found my code is useful!
But I am also a bit worried about potential bugs. I've only tested with tinyllama so far. It might totally break for any other model. For example, I am not sure about the transposed shapes.
In addition, I am not sure if this is the best way forward for the transformers library. Not having to add additional dependencies is certainly nice, but using NumPy is significantly slower than writing the bit wrangling code in C, because of all the copying from NumPy array to NumPy array.
Anyway, it might be nice to have a NumPy implementation to fall back on. For better completeness, I have implemented the missing quantization formats Q2_K
, Q3_K
and Q5_K
. I have not implemented the other formats, since they are expected to be worse than the existing ones.
https://github.com/99991/pygguf/commit/a417edbfc029a1bc270f984a694f9128c5afa8b9
Hello!
FYI we've been using your code in order to offer support for gguf files within the python ecosystem, by offering the ability to load them within
transformers
.We're doing so here, we've credited you in the documentation and I've added you as a co-author: https://github.com/LysandreJik/transformers/pull/2/files
We'll open a PR on the main fork in the coming days so I wanted to give you an opportunity to give it a look beforehand.
Thanks a lot for your work :hugs:
cc @younesbelkada