[QUESTION] Quantizing in a different way...

bitsandbytes-foundation / bitsandbytes

Accessible large language models via k-bit quantization for PyTorch.

https://huggingface.co/docs/bitsandbytes/main/en/index

MIT License

6.26k stars 627 forks source link

[QUESTION] Quantizing in a different way... #1256

Open 0wwafa opened 4 months ago

0wwafa commented 4 months ago

Hello! I did some research (using llama.cpp) and I found out that quantizing the input and embed tensors to f16 and the other tensors to q5_k or q6_k gives excellent results and almost indistinguishable from the pure f16 and with half the size.

Is it possible to do the same with bitsandbytes/transformers so to produce a model quantized in this way from a normal model?

You find my (gguf) quantization at https://huggingface.co/ZeroWw for reference.

Thanks.

0wwafa commented 4 months ago

Hey. nobody answered here..