Open statelesshz opened 1 day ago
Thanks a lot for sharing this PR and the video demo! Thanks to the demo, I was able to successfully run NF4 quant/dequant on the NPU with ease. The detailed explanation in the video really helped me understand the process and key steps. Looking forward to more updates in the future—great work!
I hope this PR can be merged soon, as it provides valuable improvements. Looking forward to seeing it merged! cc @Titus-von-Koeller
What does this PR do?
This PR adds Ascend NPU support for nf4 quant/dequant, and make it possible to do QLoRA fine-tuning for LLMs using transformers, peft, and trl.
You may notice that the nf4 quantization method is currently implemented in pytorch. This interim measure is due to the fact that the high-performance version implemented using AscendC is still in progress 😞 . Meanwhile, we've received feedback from many in the Ascend NPU community expressing their keen interest in using QLoRA to fine-tune Large Language Models (LLMs) at the earliest opportunity.
Related PR: https://github.com/huggingface/transformers/pull/31512
Collaborators
@SlightwindSec @Ginray @MatrixPlayer
cc @Titus-von-Koeller