artidoro / qlora

QLoRA: Efficient Finetuning of Quantized LLMs
https://arxiv.org/abs/2305.14314
MIT License
10.06k stars 822 forks source link

Why do we need the Dequantization process? #258

Open nthehai01 opened 1 year ago

nthehai01 commented 1 year ago

Hi folks, I'm just curious about why the Dequantization process is necessary here when finetuning the LoRA weights. Could we further quantize the input X to 4-bit and also learn the LoRA weights in 4-bit instead of 16-bit as presented in the paper? By doing so, we don't need to dequantize any weights -> save computational resource?

Is there something secret here and does NVIDIA support for 4-bit matrix multiplication?