Understanding the forward operation

cccntu / minLoRA

minLoRA: a minimal PyTorch library that allows you to apply LoRA to any PyTorch model.

MIT License

432 stars 29 forks source link

I noticed that you apply the mul operation in LoraA and LoraB, then, you sum the result with the input. I think the result of multiplying LoraA and LoraB has to be summed to the original weights, or I am wrong?

This is the mechanism of torch.parametrizations https://pytorch.org/tutorials/intermediate/parametrizations.html

Could you also explain the scaling factor?

scaling follows the original implementation https://github.com/microsoft/LoRA It's mentioned in the paper. From my understanding it's not important, it's only there to control for the change of rank.

cccntu / minLoRA

Understanding the forward operation #8