cloneofsimo / lora

Using Low-rank adaptation to quickly fine-tune diffusion models.
https://arxiv.org/abs/2106.09685
Apache License 2.0
6.94k stars 479 forks source link

LoRA Join operation #160

Closed cloneofsimo closed 1 year ago

cloneofsimo commented 1 year ago

This insane PR makes it possible to now "JOIN" LoRAs, in the following way : Suppose we have to LoRA $A_1, B_1$ and $A_2, B_2$, and want to merge it in weight-space equivalent way : i.e.,

$$ W' = W_0 + A_1 B_1^T + A_2 B_2^T = W_0 + A_3 B_3^T $$

So clearly, we can just concatenate $A_3 = [A_1, A_2]$ and $B_3 = [ B_1, B_2 ]$ so that above can be achieved just by simple linear algebra. One interesting feature that also come out is now toggle with vector, so that we set $\Sigma = diag(v)$, with $v_i \in [0, 1]$,

$$ W_m = [ A_1 , A_2 ] \Sigma [ B_1 B_2 ] ^T $$

This is now rank $r_1 + r_2$, and has a capability to become either one. Notice that, with

$$ \Sigma = \begin{bmatrix} I_{r_1} & 0 \ 0 & 0 \end{bmatrix} $$

We have:

$$ W_m x = [ A_1 , A_2 ] \begin{bmatrix} I_r & 0 \ 0 & 0 \end{bmatrix} [ B_1 B_2 ] ^T x = A_1 B_1 ^T x $$

So it becomes 1st lora. Merging can be done with lora_add --mode=ljl. Example usage:

lora_add ./example_loras/modern_disney_svd.safetensors ./example_loras/lora_krk.safetensors ./merged.safetensors --mode=ljl

Now there is no effect of alpha here, and they can be toggled dynamically, separately. How? you ask?

We can see this in action with new set_lora_diag function. it takes model as first arg and tensor of shape (r,) as second arg. Example usage:

SC = 1.0
tune_lora_scale(pipe.unet, SC)
tune_lora_scale(pipe.text_encoder, SC)
set_lora_diag(pipe.unet, torch.tensor([2.5] * 8 + [0.0] * 4))
set_lora_diag(pipe.text_encoder, torch.tensor([2.5] * 8 + [0.0] * 4))

This will set $\Sigma = diag([2.5, 2.5, \cdots , 0.0])$/

So since modern_disney_svd is rank-8 LoRA and krk is rank-4 lora, above will have an effect of toggling modern-disney. image

Similarly,

set_lora_diag(pipe.text_encoder, torch.tensor([0.0] * 8 + [1.0] * 4))

Will have an effect of toggling kirkio-lora. image

cloneofsimo commented 1 year ago

@brian6091 I also implemented exposing scale parameter, and saving it via realize_as_lora, which might be used for example to implement #154 . Note that this is not idempotent, as saving LoRA and reloading it back with same scale factor will square the scale parameter. So essentially you can't "continue training" with this one, but hence the reason for name realize_as_lora, which means some other form of transformation has to be done to continue training in general.