Vahe1994 / AQLM

Official Pytorch repository for Extreme Compression of Large Language Models via Additive Quantization https://arxiv.org/pdf/2401.06118.pdf and PV-Tuning: Beyond Straight-Through Estimation for Extreme LLM Compression https://arxiv.org/abs/2405.14852
Apache License 2.0
1.15k stars 174 forks source link

convert_to_hf.py does not work after finetune.py and convert_legacy_model_format.py #130

Open ArtemBiliksin opened 3 weeks ago

ArtemBiliksin commented 3 weeks ago

Hello!

convert_to_hf.py fails after running finetune.py and convert_legacy_model_format.py in sequence.

Next we will talk about the main (a441a3f) branch.

The main.py file stores the result as 0.pth, ..., 32.pth, args.pt, not_quantized_weights.pt.

Then run finetune.py, convert_legacy_model_format.py, convert_to_hf.py in sequence. Executing convert_legacy_model_format.py results in files 0.pth, ..., 32.pth, not_quantized_weights.pt without the args.pt file, which requires the convert_to_hf.py file. Consequently, running convert_to_hf.py will terminate with an error.

Prior to the last MR (a441a3f), the main branch state was at commit 559a366, which had a special copy of the args.pt file in finetune.py so that running convert_to_hf.py would work correctly.

The problem can be circumvented by copying the args.pt file to the correct directory, as shown in the example below:

MNT_PATH=...

BASE_MODEL_NAME=meta-llama/Llama-2-7b-hf
QUANTIZED_MODEL_PATH=${MNT_PATH}/quantized_llama_2_7b
FINETUNED_QUANTIZED_MODEL_PATH=${MNT_PATH}/finetuned_quantized_llama_2_7b
P_FINETUNED_STATE_DICT=${FINETUNED_QUANTIZED_MODEL_PATH}/quantized_model_state_dict_rank0.pt

SAME_FORMAT_AS_QUANTIZED_MODEL_PATH=${MNT_PATH}/tmp

python3 AQLM/convert_legacy_model_format.py \
    --base_model $BASE_MODEL_NAME \
    --quantized_model $QUANTIZED_MODEL_PATH \
    --p_finetuned_state_dict $P_FINETUNED_STATE_DICT \
    --save $SAME_FORMAT_AS_QUANTIZED_MODEL_PATH

############################################################################
cp $QUANTIZED_MODEL_PATH/args.pt $SAME_FORMAT_AS_QUANTIZED_MODEL_PATH
############################################################################

HF_FINETUNED_QUANTIZED_MODEL_PATH=${MNT_PATH}/finetuned_quantized_llama_2_7b_hf
mkdir -p $HF_FINETUNED_QUANTIZED_MODEL_PATH

python3 AQLM/convert_to_hf.py \
    $BASE_MODEL_NAME \
    $SAME_FORMAT_AS_QUANTIZED_MODEL_PATH \
    $HF_FINETUNED_QUANTIZED_MODEL_PATH \
    --save_safetensors \
    --save_tokenizer

However, this solution requires the user to understand how the python files of the AQLM repository are organized and what data they create.

The requirement for the user to create the HF_FINETUNED_QUANTIZED_MODEL_PATH (out_path in AQLM/convert_to_hf.py) directory also looks strange. If it is not created in advance, convert_to_hf.py will generate an error.

BlackSamorez commented 2 weeks ago

Hi! I don't think you need to use convert_legacy_model_format.py at all. Have you tried using convert_to_hf.py on the output of finetune.py?

ArtemBiliksin commented 2 weeks ago

Hi, @BlackSamorez!

Yes, sure, I tried running convert_to_hf.py right after finetune.py. And I crashed with the error I'm writing about in this issue.

I guess I should add more information about the issue here.

Short answer. You need to look at the beginning of the convert_legacy_model_format.py file. It says in detail what this file is for:

"""
This abomination converts between one of several quantized model formats to the same format as returned by main.py .
This code exists because we failed to produce a single data format for quantized model.
We should eventually switch to saving all models in the same data format. Once we do, this file should be deleted.
"""

Without it, you can't run convert_to_hf.py right after finetune.py.

Below is the detailed answer.

The main.py file stores the quantized model as multiple files (you can follow the link that contains the code snippet in main.py to see where this is used)

The finetune.py file takes the result of the main.py file, i.e. files 0.pth, 1.pth, ... N.pth, args.pt, not_quantized_weights.pt, and saves the model after finetuning as files (you can follow the link that contains the code snippet in finetune.py to see where this is used)

i.e. a completely different data format. If you use convert_to_hf.py after running finetune.py, you will get an error, because convert_to_hf.py requires the format that the main.py file produces. Below I provide links from the convert_to_hf.py file to show what data convert_to_hf.py expects:

To solve the problem of data inconsistency, the file convert_legacy_model_format.py was created. The convert_legacy_model_format.py file takes the files (you can follow the link that contains the code snippet in convert_legacy_model_format.py to see where this is used)

and converts them to files

If you run convert_to_hf.py after convert_legacy_model_format.py, you will still get an error, because convert_to_hf.py requires the args.pt file, which is lost after finetune.py and convert_legacy_model_format.py.

I wrote a crutch solution above (cp $QUANTIZED_MODEL_PATH/args.pt $SAME_FORMAT_AS_QUANTIZED_MODEL_PATH) that simply copies the args.pt file to the correct folder so that convert_to_hf.py can work properly.