Closed xiaoxiangshusheng closed 4 months ago
Hi,
Is this the base model, as opposed to the Instruct model?
Hi, @jklj077 , my used model is Qwen2-72B-Instruct,I have modify the title.
Could you tell me which dataset is used to generate Qwen2-72B-Instruct-AWQ , which is supplyed in https://modelscope.cn/models/qwen/Qwen2-72B-Instruct-AWQ/summary Thanks!
Hi,
As the error logs suggest, there are NaNs in the quantization procedure. For an instruct (or chat) model in Qwen series, it would be better to use an instruction or chat dataset formatted in ChatML for calibration in quantization.
We have used the in-house data for post-training to produce the provided models but other datasets for SFT should be also okay.
Hi, I have used my SFT data to quantize STF Qwen2-72B-Instruct model. Unfortunately, it can not be successful.
In fact , autoAwq needs to reshape input data into regular size, in autoawq code, the shape is [batch_size, block_size],block_size has been set to 512. For qwen2 model, because the model supports longer text, it is necessary to modify block_size in quantization? In other words, quantized model can be accquired only by setting SFT data? I am grateful for your reply!
You can try using this code: https://github.com/yangyo/AutoAWQ.git
You can try using this code: https://github.com/yangyo/AutoAWQ.git
Thank you for your advice. The new code uses number 1 to replace NAN, but in my opinion, grid search will be affected in an undesirable direction if scales_view = torch.ones_like(scales_view) when meets NAN. I guess the model dtype is bfloat16, while autoawq code requires dtype is float16
You can try using this code: https://github.com/yangyo/AutoAWQ.git
Can you provide an example of your code for testing?
I'm trying to quant with your code for this model:
model_path = "/home/dario/tess2"
quant_path = "/models/migtissera/Tess-v2.5.2-Qwen2-72B"
quant_config = { "zero_point": True, "q_group_size": 128, "w_bit": 4, "version": "GEMM" }
# Load model
model = AutoAWQForCausalLM.from_pretrained(
model_path,
torch_dtype= torch.bfloat16,
**{
"low_cpu_mem_usage": True,
"use_cache": False,
#"device_map": 'auto',
#"max_memory": {0: "10GB", "cpu": "190GB", 1:"10GB", 2:"10GB", 3:"10GB" }
}
)
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
# Quantize
model.quantize(tokenizer, quant_config=quant_config)
# Save quantized model
model.save_quantized(quant_path)
tokenizer.save_pretrained(quant_path)
it is a model finetune from Qwen2-72B and fails on layer 17
false negative, model was corrupted download, now pass layer 17 and continue quanting....
when I use my dataset to quantize qwen2-72B by autoawq, it not works successful. whether dataset is c4 or my data, it can not be quantized.