Model size doubles after .merge_and_unload() and .save_pretrained()

anudeep-peela commented 1 year ago

My System Info

peft==0.4.0 accelerate==0.18.0 transformers==4.28.0 py310

Reproduction

After training, I merge the peft weights with base model using:

model_ft = PeftModel.from_pretrained(
    AutoModelForCausalLM.from_pretrained(
        base_model_path,
        return_dict=True,
        torch_dtype='auto',
        use_cache=True,
    ),
    peft_path,
    torch_dtype=torch.float16
).merge_and_unload()

Then for inference as standalone model, I save to disk using

model.save_pretrained(destination_path)
tokenizer.save_pretrained(destination_path)

And later load it back again whenever needed using

inference_model = AutoModelForCausalLM.from_pretrained(
model_path,
return_dict=True,
torch_dtype=torch.float16,
use_cache=True,
device_map="auto"
)

Expected behavior

I am training Star Coder 7B, which initially has a size of around 15GB. I began the training with specific LoRa Rank and alpha parameters. To experiment with different combinations of these parameters, I stopped the training process few times, performed a "merge_and_unload" operation. Afterward, I restart the training with a new combination of LoRa and alpha values on top of latest stored model. This approach worked well up to approximately 500-600 steps. However, after that point, I noticed an issue: when I saved my model after merging, its disk size unexpectedly ballooned to 30GB, even though my "adapter_bin" file is only around 400MB. Not sure why the model size increased?

SankhaSubhra commented 1 year ago

I am having the same issue with Falcon 1b. The original model is about 2.3g on disk while the adapter is about 40m. After merging, the model is saved with 4.5g in disk. I checked if the number of parameters are keeping constant and they are. Also using safetensors did not reduce the model size after merging.

I am using HuggingFace 4.30 PEFT 0.5.0

kiamesdavies commented 11 months ago

Same Issue with LLama 2 models both 7b and 13b

SankhaSubhra commented 11 months ago

Try with dtype=torch.bfloat16 (i.e. during model load for merging, assuming the original was already in half precision so is the lora), that solved the issue for me. I believe the model in default loads in torch.float32, that explains the doubling in size.

kiamesdavies commented 11 months ago

Thanks @SankhaSubhra, also found it in a merge script for the same purpose https://github.com/georgian-io/LLM-Finetuning-Hub/blob/7c0413ebedba7ee96d0c17c02f2158c7d3c4c142/inference/text_generation/merge_script.py#L42C29-L42C29

bigcode-project / starcoder