Open iseesaw opened 4 months ago
I modified the merge code in Converting the State Dict.ipynb
, where I replace lora
with dora
.
And then I merge the qlora adapter with the base model:
config = PeftConfig.from_pretrained(PEFT_MODEL)
model = AutoModelForCausalLM.from_pretrained(
config.base_model_name_or_path,
return_dict=True,
# quantization_config=bnb_config,
device_map="auto",
torch_dtype=torch.bfloat16,
# trust_remote_code=True,
)
model = PeftModel.from_pretrained(model, PEFT_MODEL)
#Merge the adapter with the base model
model = model.merge_and_unload()
#Save the merged model in a directory "./naive_merge/" in the safetensors format
model.save_pretrained(PEFT_MODEL + "-merged", safe_serialization=True)
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
tokenizer.save_pretrained(PEFT_MODEL + "-merged")
But I got repeated response like
\nE. It is a complication of the disease\nF. It is a disease of the disease\nG. It is a disease of the disease\nG. It is a disease of the disease\nG. It is a disease of the disease\nG. It is a disease of the disease\nG. It is a disease of the disease\nG. It is a disease of the disease\nG. It is a disease of the disease\nG. It is a disease of the disease\nG. It is a disease of the disease\nG. It is a disease of the disease\nG. It is a disease of the disease\nG. It is a disease of the disease\nG. It is a disease of the disease\n
I want to know where the problem occurs.? Fine-tune or weights merge?
See my #57 also. Similar question/request.
This is kind of working for me. We need to convert dora name to lora name in the tensor_dict. After getting the lora adapter, we can do normal merging to get the final model.
import torch
from peft import LoraConfig, TaskType, get_peft_config, get_peft_model
from safetensors import safe_open
from transformers import (
AutoModelForCausalLM,
AutoTokenizer,
BitsAndBytesConfig,
LlamaForCausalLM,
)
tensors = {}
with safe_open(
"model_state_dict.safetensors",
framework="pt",
device=0,
) as f:
for k in f.keys():
tensors[k] = f.get_tensor(k) # loads the full tensor given a key
# print(k, tensors[k].dtype, tensors[k].shape) # Uncomment to view
new_tensors = {}
for _k in tensors:
if "dora" not in _k:
continue
else:
k = "base_model.model." + _k
k = k.replace(".dora_layer", "")
k = k.replace(".weight", ".default.weight")
new_tensors[k] = tensors[_k]
tensors = new_tensors
# Make sure the compute type, target modules, rank, alpha etc match!
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_use_double_quant=False,
bnb_4bit_compute_dtype=torch.bfloat16,
)
model = LlamaForCausalLM.from_pretrained(
"mistralai/Mistral-7B-Instruct-v0.3",
use_cache=False,
quantization_config=bnb_config,
)
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.3")
# Freeze
for param in model.parameters():
param.requires_grad = False
# Add LoRA (make sure your rank (r) and alpha (lora_alpha) values match those used in training!)
peft_config = LoraConfig(
task_type=TaskType.CAUSAL_LM,
inference_mode=False,
r=64,
lora_alpha=16,
lora_dropout=0.1,
# target_modules=['q_proj','k_proj','v_proj','o_proj','gate_proj','up_proj','down_proj','lm_head']
target_modules=["k_proj", "q_proj", "v_proj", "up_proj", "down_proj", "gate_proj"],
)
model = get_peft_model(model, peft_config)
# Check out the first few keys in the state dict:
print(list(model.state_dict().keys())[:10])
new_sd = model.state_dict()
for k in new_sd:
if "lora" in k:
new_sd[k] = tensors[k]
model.load_state_dict(new_sd, strict=False)
model.save_pretrained("lora_adapters")
tokenizer.save_pretrained("lora_adapters")
@lochuynh1412 how's the quality of the merged model?
Hello,
I've successfully finetuned Llama-3 8B with QDoRA and am now looking to perform inference using vLLM. Could you provide guidance or scripts on how to merge the QDoRA adapters with the original base model? Additionally, does this process involve quantization and dequantization of the base model?
Thank you!