How to implement normal float NF4?

artidoro / qlora

QLoRA: Efficient Finetuning of Quantized LLMs

https://arxiv.org/abs/2305.14314

MIT License

10.06k stars 822 forks source link

How to implement normal float NF4? #299

Open XA23i opened 4 days ago

XA23i commented 4 days ago

Hi, in uniform quantization we can do xq = [x/s] + offset and \hat{xq} = (x - offset) * s. However, in NF4 quantization, we need to find the nearest quantization outputs of x. I am wondering how to implement it efficiently?