config.json seems still quantize the merged model trained via QLora

notoookay commented 5 months ago

Hi, is there anyone try to use qlora finetune model? I used qlora to finetune the llama2 model then I merged it, but I found in config.json that model still be quantized the model with load_in_4bit=True. Model should be dequantized back to bfloat16 before merging, right?

hamishivi commented 5 months ago

Hi, its been a little since I worked with the qlora code, but after merging you should be able to use bfloat16 or quantised forms. I think 8bit should behave pretty similarly to bflaot16.

notoookay commented 5 months ago

Thank you for spending time reply, I fine-tuned model with qlora, and config.json file contains quantization_config which should be caused by these codes:

quantization_config=BitsAndBytesConfig(
            load_in_4bit=True,
            bnb_4bit_compute_dtype=torch.bfloat16,
            bnb_4bit_use_double_quant=True,
            bnb_4bit_quant_type="nf4",
        )

Sorry for that I can't offer specific configs as I already deleted the relevant configs then the model works fine. I don't know if these configs should be there or just some errors, but it looks like loading merged model in quantized 4-bit. I think my problem is similar to this comment.

allenai / open-instruct

config.json seems still quantize the merged model trained via QLora #149