artidoro / qlora

QLoRA: Efficient Finetuning of Quantized LLMs
https://arxiv.org/abs/2305.14314
MIT License
10.06k stars 822 forks source link

How to implement normal float NF4? #299

Open XA23i opened 4 days ago

XA23i commented 4 days ago

Hi, in uniform quantization we can do xq = [x/s] + offset and \hat{xq} = (x - offset) * s. However, in NF4 quantization, we need to find the nearest quantization outputs of x. I am wondering how to implement it efficiently?