Closed poaboagye closed 5 months ago
Thank you for your reminder.
In a certain upgraded version of transformers or PEFT, even with the use of an adapter, both embed_tokens
and lm_head
are updated. Therefore, we use the following code to prevent embed_tokens
and lm_head
from being updated, ensuring that only the PiSSA adapter is updated:
print("<=======params.requires_grad=======>")
for name, params in model.named_parameters():
if "embed_tokens" in name or "lm_head" in name:
params.requires_grad=False
if params.requires_grad:
print(name)
Thank you very much for your response. I'll run the new pissa.py then.
Update:
Everything works now. Thank you!
You are welcome, enjoy 🍕.
Hi, when I run the pissa.sh with this (below), I get an error, probably due to mismatches in tensor sizes.
--------pissa.sh------- python pissa.py \ --model_name_or_path meta-llama/Llama-2-7b-hf \ --output_dir ./output/pissa-llama-2-7b-r128 \ --init_lora_weights pissa \ --lora_r 128 \ --data_path meta-math/MetaMathQA \ --dataset_split "train[:100000]"\ --dataset_field query response \ --num_train_epochs 1 \ --per_device_train_batch_size 1 \ --gradient_accumulation_steps 128 \ --save_strategy "steps" \ --save_steps 100 \ --save_total_limit 1 \ --learning_rate 2e-5 \ --weight_decay 0. \ --warmup_ratio 0.03 \ --lr_scheduler_type "cosine" \ --logging_steps 1 \ --bf16 True \ --tf32 True \ --report_to wandb
CUDA_VISIBLE_DEVICES=0 python merge_adapter_to_base_model.py --base_mode meta-llama/Llama-2-7b-hf --adapter ./output/pissa-llama-2-7b-r128/ft/ --output_path ./output/pissa-llama-2-7b-r128 CUDA_VISIBLE_DEVICES=0 python inference/gsm8k_inference.py --model ./output/pissa-llama-2-7b-r128 CUDA_VISIBLE_DEVICES=0 python inference/MATH_inference.py --model ./output/pissa-llama-2-7b-r128
--------Error-------
/home/Ubuntu/anaconda3/envs/embComp/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning:
model = PeftModel.from_pretrained(model, args.adapter, config=lora_config)
File "/home/Ubuntu/anaconda3/envs/embComp/lib/python3.10/site-packages/peft/peft_model.py", line 430, in from_pretrained
model.load_adapter(model_id, adapter_name, is_trainable=is_trainable, **kwargs)
File "/home/Ubuntu/anaconda3/envs/embComp/lib/python3.10/site-packages/peft/peft_model.py", line 988, in load_adapter
load_result = set_peft_model_state_dict(
File "/home/Ubuntu/anaconda3/envs/embComp/lib/python3.10/site-packages/peft/utils/save_and_load.py", line 353, in set_peft_model_state_dict
load_result = model.load_state_dict(peft_model_state_dict, strict=False)
File "/home/Ubuntu/anaconda3/envs/embComp/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2189, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for PeftModelForCausalLM:
size mismatch for base_model.model.model.embed_tokens.weight: copying a param with shape torch.Size([32001, 4096]) from checkpoint, the shape in current model is torch.Size([32000, 4096]).
size mismatch for base_model.model.lm_head.weight: copying a param with shape torch.Size([32001, 4096]) from checkpoint, the shape in current model is torch.Size([32000, 4096]).
promt ===== Below is an instruction that describes a task. Write a response that appropriately completes the request.
resume_download
is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, useforce_download=True
. warnings.warn( Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:04<00:00, 2.34s/it] Traceback (most recent call last): File "/home/Ubuntu/Documents/embComp/PiSSA/merge_adapter_to_base_model.py", line 16, in