Apply LoRA to more Linear layers

Lightning-AI / lit-llama

Implementation of the LLaMA language model based on nanoGPT. Supports flash attention, Int8 and GPTQ 4bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training. Apache 2.0-licensed.

Apache License 2.0

5.97k stars 518 forks source link

Apply LoRA to more Linear layers #350

Open carmocca opened 1 year ago

carmocca commented 1 year ago

Our current LoRA implementations applies it to just the qv computation. However, recent trends suggest there are performance improvements to gain from applying it elsewhere.

For instance, the QLoRA paper reports:

As shown in Figure 2 for LLaMA 7B finetuning on Alpaca, we find that the most critical LoRA hyperparameter is how many LoRA adapters are used in total and that LoRA on all linear transformer block layers are required to match full finetuning performance

I've seen other online practitioners also apply it to the lm_head and MLP. But I don't have any sources to cite about whether that's better or worse

Andrei-Aksionov commented 1 year ago

In the section 7.1 of LoRA paper authors compared less LoRA layers with higher rank versus more layers with smaller rank and found out that bigger amount of layers wins despite having a smaller rank. That of course doesn't necessary mean that with all being equal the more LoRA layers the better, but it's best what came to my mind.

Andrei-Aksionov commented 1 year ago

Hello @carmocca

I can help with that. Well, sorta. I don't have even a single GPU so I can create a code that supports different configurations, check that everything works (with some small model that can run on my laptop) and then someone from your team with an access to servers can run and check the results.

I am thinking about providing a string to lora context manager, something like qkvpmh, where:

q: query
k: key
v: value
p: projection
m: MLP
h: head

so it the key is provided then LoRa will be applied to corresponding weights.

Does it work for you? Or it's easier for you to do on your own rather then spending time on coordination/fixing mistakes?

carmocca commented 1 year ago

@Andrei-Aksionov Feel free to start this work! We won't have time to work on this for now.

You might want to work on the lit-gpt repository instead, which also has a LoRA implementation: https://github.com/Lightning-AI/lit-gpt/blob/main/lit_gpt/lora.py

For the implementation, I would be more explicit, referencing the actual linear attribute names, instead of having the minified mapping of qkvpmh to the different layers. I suggest that you find the most straightforward solution that works for now. The API can always be made more complex later as we learn of new limitations or requirements that require more complexity.

Andrei-Aksionov commented 1 year ago

You might want to work on the lit-gpt repository instead

Why is that? I have nothing against it, just curious.

For the implementation, I would be more explicit,

Sure, that makes sense.

carmocca commented 1 year ago

We are focusing more on that project moving forward. It includes support for gpt-neox-derivative and llama-derivative weights.

Andrei-Aksionov commented 1 year ago

Understood. Well, then we'll met there :)