OpenAccess-AI-Collective / axolotl

Go ahead and axolotl questions
https://openaccess-ai-collective.github.io/axolotl/
Apache License 2.0
6.83k stars 749 forks source link

UnboundLocalError during merge of fine-tuned model #1517

Open iakoster opened 2 months ago

iakoster commented 2 months ago

Please check that this issue hasn't been reported before.

Expected Behavior

Merged model

Current behaviour

When I call the command python3 -m axolotl.cli.merge_lora configs/config.yaml --lora_model_dir="./lora-out", an error occurs after a while traceback

Steps to reproduce

  1. Run accelerate launch -m axolotl.cli.train configs/config.yaml
  2. Run python3 -m axolotl.cli.merge_lora configs/config.yaml --lora_model_dir="./lora-out"

Config yaml

base_model: IlyaGusev/saiga_mistral_7b_lora
base_model_config: Open-Orca/Mistral-7B-OpenOrca
model_type: MistralForCausalLM
tokenizer_type: LlamaTokenizer

load_in_8bit: true
load_in_4bit: false
bf16: auto
fp16:
tf32: false

datasets:
  - path: data/synthetic-data.jsonl
    type: alpaca

dataset_prepared_path: last_run_prepared
val_set_size: 0.1

sequence_len: 8192
pad_to_sequence_len: true
sample_packing: true
eval_sample_packing: false

adapter: lora
lora_model_dir:

lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
lora_target_linear: false
lora_fan_in_fan_out:
lora_target_modules:
  - gate_proj
  - down_proj
  - up_proj
  - q_proj
  - v_proj
  - k_proj
  - o_proj

wandb_project: lnvnd
wandb_entity:
wandb_watch:
wandb_name: saiga-mistral-7b-v0_1-id2
wandb_run_id:
wandb_log_model:

output_dir: ./lora-out

# research
gradient_accumulation_steps: 4
micro_batch_size: 2
num_epochs: 1
warmup_steps: 10
learning_rate: 0.0002
logging_steps: 1
evals_per_epoch: 10
saves_per_epoch: 1
max_steps: 10

eval_table_size:
eval_max_new_tokens: 128

loss_watchdog_threshold: 5.0
loss_watchdog_patience: 3

train_on_inputs: false
group_by_length: false

gradient_checkpointing: true

lr_scheduler: cosine

optimizer: adamw_bnb_8bit
weight_decay: 0.0

xformers_attention:
flash_attention: false
resume_from_checkpoint:

local_rank:

special_tokens:

fsdp:
fsdp_config:

deepspeed:

strict: false

Possible solution

I found the issue in transformer repo, but there is no solution there (other than correct usage).

Which Operating Systems are you using?

Python Version

3.10

axolotl branch-commit

main/5ed2939

Acknowledgements

NanoCode012 commented 2 months ago

Hey, can you try upgrade/downgrade peft?

iakoster commented 2 months ago

@NanoCode012 Yes, I downgraded the peft version to 0.9.0, but then it raises TypeError: LoraConfig.__init__() got an unexpected keyword argument 'layer_replication', I didn't try further. I also consistently downgraded transformers to 0.37.0 and it didn't help.

NanoCode012 commented 2 months ago

Ah yes, I believe that was another bug in peft. Can you go lower or higher?

Are you on the same version as in requirements?

iakoster commented 2 months ago

Ah yes, I believe that was another bug in peft. Can you go lower or higher?

I have the same error on peft version up to 0.7.0. On version 0.6.2 I have the error ImportError: cannot import name 'LoftQConfig' from 'peft' (/home/jupyter/.local/lib/python3.10/site-packages/peft/__init__.py). I can't go higher than that as I used the latest version.

Are you on the same version as in requirements?

Yes, at the time of model training and the first merge everything was in compliance with the requirements.

iakoster commented 2 months ago

I can't go higher than that as I used the latest version.

My mistake. Now I tested it with peft from github repo - same error.

NanoCode012 commented 2 months ago

Have you been able to merge before this? Any change since?

iakoster commented 2 months ago

@NanoCode012 No. This is my first experience using the library.

NanoCode012 commented 2 months ago

I think I see a key difference: your "base_model" is an adapter, not full model. I don't know if we have a test for this.

However, when I checked the code, this should've not been a problem.

iakoster commented 2 months ago

@NanoCode012 I also saw the following warning [peft.tuners.tuners_utils.__init__:154] [PID:10053] Already found a 'peft_config' attribute in the model. This will lead to having multiple adapters in the model. Make sure to know what you are doing! in the logs. Maybe this will help.

Any idea what I should fix to make it work?

NanoCode012 commented 2 months ago

Can you perhaps add a print statement before this merge statement? I assume it should run: https://github.com/OpenAccess-AI-Collective/axolotl/blob/132eb740f036eff0fa8b239ddaf0b7a359ed1732/src/axolotl/utils/models.py#L671-L672 . if this is not run, then, your warning makes sense.

iakoster commented 2 months ago

@NanoCode012 I don't have access to the interpreter (I work in cloud), so I'll fork the repo, make the changes, reinstall the library and test tomorrow.

NanoCode012 commented 2 months ago

You can nano axolotl/src/axolotl/utils/models.py and make the changes. You won't need to reinstall nor fork. It should be loaded automatically.

iakoster commented 2 months ago

@NanoCode012 I work using jupyter (jupyter is isolated from the OS and with reduced permissions), so it won't help me, unfortunately (at least I don't know the ways)

I added print statements before and in the if clause.

LOG.info("-" * 27 + "before merge and unload" + "-" * 26)
if isinstance(model, (PeftModel, PeftModelForCausalLM)) and not qlora_fsdp:
    LOG.info("-" * 30 + "merge and unload" + "-" * 30)
    model = model.merge_and_unload()

And you were right, the code under the condition is not executed. image

I also did a test with the base model (modified part of the config - below) and everything was fine, the final model merges successfully.

base_model: Open-Orca/Mistral-7B-OpenOrca
model_type: MistralForCausalLM
tokenizer_type: LlamaTokenizer
NanoCode012 commented 2 months ago

Ahh, that can possibly be it. Would you be an able to print the model variable to see its class? For the short term, you may be able to move the merge out of the if condition just to get your current merge working.

iakoster commented 2 months ago

@NanoCode012 model variable class is transformers.models.mistral.modeling_mistral.MistralForCausalLM

NanoCode012 commented 2 months ago

Sorry, I forgot to get back. ~Can you temporarily move the merge out of the if condition? Does that work?~

Edit: It seems that it would error.

NanoCode012 commented 2 months ago

Hm, I could not reproduce this issue. I tried

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel, PeftModelForCausalLM, LoraConfig

peft_model_id = "ybelkada/opt-350m-lora"
# load model via lora
model = AutoModelForCausalLM.from_pretrained(peft_model_id)

# add new adapter
lora_config = LoraConfig(
    target_modules=["q_proj", "k_proj"],
    init_lora_weights=False
)

model.add_adapter(lora_config, adapter_name="adapter_1")

model.save_pretrained("test")

It doesn't error on the save stage.

iakoster commented 2 months ago

Sorry, I forgot to get back. ~Can you temporarily move the merge out of the if condition? Does that work?~

Edit: It seems that it would error.

Yeah, there's gonna be an error. I changed the code as follows:

LOG.info("-" * 27 + "before merge and unload" + "-" * 26)
model = model.merge_and_unload()
LOG.info("-" * 30 + "merge and unload" + "-" * 30)

And got the error AttributeError: 'MistralForCausalLM' object has no attribute 'merge_and_unload'

iakoster commented 2 months ago

Could this error be specific to the mistral model?

NanoCode012 commented 2 months ago

No, it's as expected. That function belongs to PeftModel which that object is not.

At this point, can you try, load the base model then use the merge lora feature with your current base IlyaGusev/saiga_mistral_7b_lora. Then, load that new base and merge it with your current trained model?