Request for Scripts to Merge QDoRA Adapters with Base Model for vLLM Inference

iseesaw commented 4 months ago

Hello,

I've successfully finetuned Llama-3 8B with QDoRA and am now looking to perform inference using vLLM. Could you provide guidance or scripts on how to merge the QDoRA adapters with the original base model? Additionally, does this process involve quantization and dequantization of the base model?

Thank you!

iseesaw commented 4 months ago

I modified the merge code in Converting the State Dict.ipynb, where I replace lora with dora.

And then I merge the qlora adapter with the base model:

    config = PeftConfig.from_pretrained(PEFT_MODEL)
    model = AutoModelForCausalLM.from_pretrained(
        config.base_model_name_or_path,
        return_dict=True,
        # quantization_config=bnb_config,
        device_map="auto",
        torch_dtype=torch.bfloat16,
        # trust_remote_code=True,
    )

    model = PeftModel.from_pretrained(model, PEFT_MODEL)

    #Merge the adapter with the base model
    model = model.merge_and_unload()

    #Save the merged model in a directory "./naive_merge/" in the safetensors format
    model.save_pretrained(PEFT_MODEL + "-merged", safe_serialization=True)

    tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
    tokenizer.save_pretrained(PEFT_MODEL + "-merged")

But I got repeated response like

\nE. It is a complication of the disease\nF. It is a disease of the disease\nG. It is a disease of the disease\nG. It is a disease of the disease\nG. It is a disease of the disease\nG. It is a disease of the disease\nG. It is a disease of the disease\nG. It is a disease of the disease\nG. It is a disease of the disease\nG. It is a disease of the disease\nG. It is a disease of the disease\nG. It is a disease of the disease\nG. It is a disease of the disease\nG. It is a disease of the disease\nG. It is a disease of the disease\n

I want to know where the problem occurs.? Fine-tune or weights merge?

pe-hy commented 4 months ago

See my #57 also. Similar question/request.

lochuynh1412 commented 2 months ago

This is kind of working for me. We need to convert dora name to lora name in the tensor_dict. After getting the lora adapter, we can do normal merging to get the final model.

import torch
from peft import LoraConfig, TaskType, get_peft_config, get_peft_model
from safetensors import safe_open
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    LlamaForCausalLM,
)

tensors = {}
with safe_open(
    "model_state_dict.safetensors",
    framework="pt",
    device=0,
) as f:
    for k in f.keys():
        tensors[k] = f.get_tensor(k)  # loads the full tensor given a key
        # print(k, tensors[k].dtype, tensors[k].shape) # Uncomment to view

new_tensors = {}
for _k in tensors:
    if "dora" not in _k:
        continue
    else:
        k = "base_model.model." + _k
        k = k.replace(".dora_layer", "")
        k = k.replace(".weight", ".default.weight")
        new_tensors[k] = tensors[_k]

tensors = new_tensors

# Make sure the compute type, target modules, rank, alpha etc match!
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=False,
    bnb_4bit_compute_dtype=torch.bfloat16,
)
model = LlamaForCausalLM.from_pretrained(
    "mistralai/Mistral-7B-Instruct-v0.3",
    use_cache=False,
    quantization_config=bnb_config,
)

tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.3")

# Freeze
for param in model.parameters():
    param.requires_grad = False

# Add LoRA (make sure your rank (r) and alpha (lora_alpha) values match those used in training!)
peft_config = LoraConfig(
    task_type=TaskType.CAUSAL_LM,
    inference_mode=False,
    r=64,
    lora_alpha=16,
    lora_dropout=0.1,
    # target_modules=['q_proj','k_proj','v_proj','o_proj','gate_proj','up_proj','down_proj','lm_head']
    target_modules=["k_proj", "q_proj", "v_proj", "up_proj", "down_proj", "gate_proj"],
)
model = get_peft_model(model, peft_config)

# Check out the first few keys in the state dict:
print(list(model.state_dict().keys())[:10])

new_sd = model.state_dict()
for k in new_sd:
    if "lora" in k:
        new_sd[k] = tensors[k]

model.load_state_dict(new_sd, strict=False)
model.save_pretrained("lora_adapters")
tokenizer.save_pretrained("lora_adapters")

williambarberjr commented 2 months ago

@lochuynh1412 how's the quality of the merged model?

AnswerDotAI / fsdp_qlora

Request for Scripts to Merge QDoRA Adapters with Base Model for vLLM Inference #60