Open evertonaleixo opened 1 year ago
I noticed that you apply the mul operation in LoraA and LoraB, then, you sum the result with the input. I think the result of multiplying LoraA and LoraB has to be summed to the original weights, or I am wrong?
This is the mechanism of torch.parametrizations https://pytorch.org/tutorials/intermediate/parametrizations.html
Could you also explain the scaling factor?
scaling
follows the original implementation https://github.com/microsoft/LoRA
It's mentioned in the paper. From my understanding it's not important, it's only there to control for the change of rank.
I noticed that you apply the mul operation in LoraA and LoraB, then, you sum the result with the input.
I think the result of multiplying LoraA and LoraB has to be summed to the original weights, or I am wrong?
Could you also explain the scaling factor?
Thanks.