GraphPKU / PiSSA

PiSSA: Principal Singular Values and Singular Vectors Adaptation of Large Language Models(NeurIPS 2024 Spotlight)
https://arxiv.org/abs/2404.02948
261 stars 9 forks source link

High Training Loss #2

Closed andyzoujm closed 6 months ago

andyzoujm commented 6 months ago

Hi,

Very cool paper!

When I try to finetune a Mistral model with LoRA in 4 bits, the loss starts around 1. However, for the exact same script with the only addition being "init_lora_weights='pissa'" the loss starts around 5. I've also tried 'pissa_niter_4' but the loss is also around 5. Do you know what could be the cause?

Thank you, Andy

fxmeng commented 6 months ago

Thank you for your interest in PiSSA. In the latest version of our paper (https://arxiv.org/pdf/2404.02948.pdf), we conducted 4-bit fine-tuning experiments in Section 4.2 and supported 4-bit training in this code: https://github.com/fxmeng/peft/blob/7b8af8e53875164e60a7707fe10f07f21c1baf75/examples/pissa_finetuning/pissa_finetuning.py Please refer to this code and the corresponding document. First, decompose the original model at full precision, and then apply 4-bit quantization to the residual model.

andyzoujm commented 6 months ago

Thank you! I will try it out and let you know there are issues.