artidoro / qlora

QLoRA: Efficient Finetuning of Quantized LLMs
https://arxiv.org/abs/2305.14314
MIT License
9.74k stars 800 forks source link

Why do we need the Dequantization process? #258

Open nthehai01 opened 10 months ago

nthehai01 commented 10 months ago

Hi folks, I'm just curious about why the Dequantization process is necessary here when finetuning the LoRA weights. Could we further quantize the input X to 4-bit and also learn the LoRA weights in 4-bit instead of 16-bit as presented in the paper? By doing so, we don't need to dequantize any weights -> save computational resource?

Is there something secret here and does NVIDIA support for 4-bit matrix multiplication?