Open funeric555 opened 7 months ago
This was last modified in this PR https://github.com/OpenAccess-AI-Collective/axolotl/pull/1390 do you mind looking in there to see if that helps?
I looked, and the issue I have is that I can't do something like: _lmhead.[^\s]*weight - and by changing match to search, it permits freezing/unfreezing layers when running LoRA.
What are the failure modes that currently exist that you recommend switching to use search instead of match? I want to make sure we can capture those in unit tests. thanks!
Thanks! A few 'issues':
1: If you are using (Q)LoRA, then the layer names are very different as I pasted above in the original issue 2: You can't ever do an intermediate match i.e. if you wanted to match all layers that contain lm_head there is currently no way, so: "lm_head.[^\s]*weight" would never be satisfied since re.match implies a left anchor, as opposed to re.search which allows anywhere. In your example config, you added "^" implying the expectation that you needed to explicitly indicate you wanted to match from the start, which is consistent with re.search, and not necessary for re.match.
So the main failure mode is
Separately - a minor nit - if you had . in your regex, the code currently makes this \. which means you can never use a "." in the regex, although using [^\s] can substitute for "." if spaces are not permitted in layer (or parameter) names.
I wonder if this could be related to/address: https://github.com/OpenAccess-AI-Collective/axolotl/issues/1188#issuecomment-1909101590 -- If you are running LoRA, we can require unfreezing only layers with "lora" - or would this not work?
After some messing around, I have a guess of some of what is going on, and would appreciate being corrected if I am wrong, as well as details about my questions...
In examining the layers for a FFT (such as with the example: examples/llama-2/fft_optimized.yml), they are normal names, as expected I.e. model.embed_tokens.weight or lm_head.weight.
When running LoRA (or QLoRA), the layers are now replaced by three layers I.e. The original, renamed something like: base_model.model.model.layers.0.self_attn.q_proj.base_layer.weight as well as the A and B for LoRA I.e.: base_model.model.model.layers.0.self_attn.q_proj.lora_A.default.weight and base_model.model.model.layers.0.self_attn.q_proj.lora_B.default.weight. For LoRA, in theory you can specify the layers to train via the config: lora_target_modules. If you unfroze the original weights, then it screws things up.
I noticed a few things, and wonder what I am missing...
For the config: lora_target_modules: when I set it to only include the "q_proj" and "k_proj", it still included the other layers as A & B LoRA I.e. base_model.model.model.layers.1.self_attn.o_proj.lora_B.default.weight I even added an invalid value, and I didn't see any errors. Examining the code, it looks like the lora_target_modules are sent to the lora_config, which is sent to the get_peft_model as a peft_config. Not sure what is happening here.
There is also a separate option: lora_modules_to_save: where you can specify the lm_head or embed_tokens. If you don't specify embed or lm_head there, then you see them as: base_model.model.lm_head.weight (and requires_grad = False).
When you do include the embed_tokens (and add extra tokens) or lm_head, then you see two layers I.e.: base_model.model.model.embed_tokens.original_module.weight and base_model.model.model.embed_tokens.modules_to_save.default.weight
So - if I understand this correctly, LoRA if applied to embed, it does not actually perform any reduction, and in theory you could specify to freeze part of the non-original embed if you want to just re-train new embeddings for new tokens...
My questions are: 1: Is it expected that if you list at least one lora_target_module, then the others should not contain the "A" and "B" LoRA layers (since you are not re-training them) 2: Confirming that for embeddings or lm_head you do not want to have reduced matrices built I.e. no A,B so the observed behavior is correct.
If you are using LoRA, you need to touch the "modules_to_save" one instead of the "original" one. I am not sure what you want to ask here. Can you ask the question to ChatGPT or Claude and elaborate on what the problem is in the code? Correction on the test cases would be very helpful. Thanks!
Please check that this issue hasn't been reported before.
Expected Behavior
The config for mixtral.yml contains examples for unfreeze such as: "^model.embed_tokens.weight$[:32000]" and "^lm_head.weights$"
The expected behavior is that the layers listed in the config mixtral.yml are unfrozen, in this case embed_tokens would be unfrozen for 0-31999 and lm_head would be unfrozen.
Current behaviour
The layers are not unfrozen as expected.
"All parameters are frozen. Model will not be trained."
Steps to reproduce
Uncomment the lines in the mixtral.yml config:
unfrozen_parameters: - ^lm_head.weight$ - ^model.embed_tokens.weight$[:32000]
Then run as shown in the readme: accelerate launch -m axolotl.cli.train examples/mistral/mixtral.yml --deepspeed deepspeed_configs/zero2.json
Config yaml
Possible solution
Two part fix: 1: change "match" to search at: https://github.com/OpenAccess-AI-Collective/axolotl/blob/main/src/axolotl/utils/freeze.py#L195 2: Match the substring in the layers instead of anchored match: change the yml regex to:
- embed_tokens.[^\s]*weight$[:32000]
or- lm_head.[^\s]*weight
Some details: When running the config as shown, the layer names are different than shown: For example:
base_model.model.model.layers.0.self_attn.q_proj.lora_A.default.weight
base_model.model.model.embed_tokens.modules_to_save.default.weight
Also - in a usecase where the goal is to freeze the existing embeddings, and unfreeze only newly added ones, you should use [32000:] to indicate that the newly added token embeddings are unfrozen, not the existing ones. If the goal is to unfreeze the embedding layer, then no need to add the slice at the end.
Which Operating Systems are you using?
Python Version
3.10.12
axolotl branch-commit
main/dbrx
Acknowledgements