qwen2-72B can not be quantized by autoawq

QwenLM / Qwen2.5

Qwen2.5 is the large language model series developed by Qwen team, Alibaba Cloud.

9.37k stars 576 forks source link

qwen2-72B can not be quantized by autoawq #528

Closed xiaoxiangshusheng closed 4 months ago

xiaoxiangshusheng commented 5 months ago

when I use my dataset to quantize qwen2-72B by autoawq, it not works successful. 1718095066807 whether dataset is c4 or my data, it can not be quantized.

jklj077 commented 5 months ago

Hi,

Is this the base model, as opposed to the Instruct model?

xiaoxiangshusheng commented 5 months ago

Hi, @jklj077 , my used model is Qwen2-72B-Instruct，I have modify the title.

xiaoxiangshusheng commented 5 months ago

Could you tell me which dataset is used to generate Qwen2-72B-Instruct-AWQ , which is supplyed in https://modelscope.cn/models/qwen/Qwen2-72B-Instruct-AWQ/summary Thanks!

jklj077 commented 5 months ago

Hi,

As the error logs suggest, there are NaNs in the quantization procedure. For an instruct (or chat) model in Qwen series, it would be better to use an instruction or chat dataset formatted in ChatML for calibration in quantization.

We have used the in-house data for post-training to produce the provided models but other datasets for SFT should be also okay.

xiaoxiangshusheng commented 5 months ago

Hi, I have used my SFT data to quantize STF Qwen2-72B-Instruct model. Unfortunately, it can not be successful.

xiaoxiangshusheng commented 5 months ago

In fact , autoAwq needs to reshape input data into regular size, in autoawq code, the shape is [batch_size, block_size]，block_size has been set to 512. For qwen2 model, because the model supports longer text, it is necessary to modify block_size in quantization? In other words, quantized model can be accquired only by setting SFT data? I am grateful for your reply！

yangyo commented 4 months ago

You can try using this code: https://github.com/yangyo/AutoAWQ.git

xiaoxiangshusheng commented 4 months ago

You can try using this code: https://github.com/yangyo/AutoAWQ.git

Thank you for your advice. The new code uses number 1 to replace NAN, but in my opinion, grid search will be affected in an undesirable direction if scales_view = torch.ones_like(scales_view) when meets NAN. I guess the model dtype is bfloat16, while autoawq code requires dtype is float16

puppetm4st3r commented 4 months ago

You can try using this code: https://github.com/yangyo/AutoAWQ.git

Can you provide an example of your code for testing?

I'm trying to quant with your code for this model:

model_path = "/home/dario/tess2"
quant_path = "/models/migtissera/Tess-v2.5.2-Qwen2-72B"
quant_config = { "zero_point": True, "q_group_size": 128, "w_bit": 4, "version": "GEMM" }

# Load model
model = AutoAWQForCausalLM.from_pretrained(
    model_path, 
    torch_dtype= torch.bfloat16,
    **{
        "low_cpu_mem_usage": True, 
        "use_cache": False, 
        #"device_map": 'auto', 
        #"max_memory": {0: "10GB", "cpu": "190GB", 1:"10GB", 2:"10GB", 3:"10GB" }
    }
)
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)

# Quantize
model.quantize(tokenizer, quant_config=quant_config)

# Save quantized model
model.save_quantized(quant_path)
tokenizer.save_pretrained(quant_path)

it is a model finetune from Qwen2-72B and fails on layer 17

puppetm4st3r commented 4 months ago

false negative, model was corrupted download, now pass layer 17 and continue quanting....