Closed gloritygithub11 closed 1 week ago
You need to use symmetric mode for awq, if you want to do inference with lightllm. Additionally, remove the act part.
I changed to use following awq config, still get the same error
base:
seed: &seed 42
model:
type: Qwen2
path: /models/Qwen2-7B-Instruct
tokenizer_mode: slow
torch_dtype: auto
calib:
name: pileval
download: False
path: /app/src/llmc/tools/data/calib/pileval
n_samples: 128
bs: -1
seq_len: 512
preproc: general
seed: *seed
eval:
eval_pos: [pretrain, transformed, fake_quant]
name: wikitext2
download: False
path: /app/src/llmc/tools/data/eval/wikitext2
bs: 1
inference_per_block: False
# For 70B model eval, bs can be set to 20, and inference_per_block can be set to True.
# For 7B / 13B model eval, bs can be set to 1, and inference_per_block can be set to False.
seq_len: 2048
quant:
method: Awq
weight:
bit: 4
symmetric: True
granularity: per_channel
group_size: -1
calib_algo: learnable
special:
trans: True
trans_version: v2
weight_clip: True
clip_version: v2
save_scale: True
scale_path: ./save/qwen2-7b-instruct-awq_w4a16-lightllm-best/scale
save_clip: True
clip_path: ./save/qwen2-7b-instruct-awq_w4a16-lightllm-best/clip
save:
save_trans: True
save_quant: False
save_lightllm: False
save_path: ./save/qwen2-7b-instruct-awq_w4a16-lightllm-best/trans
You must still use per-group quantization with 128 group size in llmc to fit the backend kernel.
after changed the config, still get the same error
base:
seed: &seed 42
model:
type: Qwen2
path: /models/Qwen2-7B-Instruct
tokenizer_mode: slow
torch_dtype: auto
calib:
name: pileval
download: False
path: /app/src/llmc/tools/data/calib/pileval
n_samples: 128
bs: -1
seq_len: 512
preproc: general
seed: *seed
eval:
eval_pos: [pretrain, transformed, fake_quant]
name: wikitext2
download: False
path: /app/src/llmc/tools/data/eval/wikitext2
bs: 1
inference_per_block: False
# For 70B model eval, bs can be set to 20, and inference_per_block can be set to True.
# For 7B / 13B model eval, bs can be set to 1, and inference_per_block can be set to False.
seq_len: 2048
quant:
method: Awq
weight:
bit: 4
symmetric: True
granularity: per_channel
group_size: 128
calib_algo: learnable
special:
trans: True
trans_version: v2
weight_clip: True
clip_version: v2
save_scale: True
scale_path: ./save/qwen2-7b-instruct-awq_w4a16-lightllm-best/scale
save_clip: True
clip_path: ./save/qwen2-7b-instruct-awq_w4a16-lightllm-best/clip
save:
save_trans: True
save_quant: False
save_lightllm: False
save_path: ./save/qwen2-7b-instruct-awq_w4a16-lightllm-best/trans
Hi, granularity is not per_channel. You should adjust to per_group.
still get the same error after change to per_group
base:
seed: &seed 42
model:
type: Qwen2
path: /models/Qwen2-7B-Instruct
tokenizer_mode: slow
torch_dtype: auto
calib:
name: pileval
download: False
path: /app/src/llmc/tools/data/calib/pileval
n_samples: 128
bs: -1
seq_len: 512
preproc: general
seed: *seed
eval:
eval_pos: [pretrain, transformed, fake_quant]
name: wikitext2
download: False
path: /app/src/llmc/tools/data/eval/wikitext2
bs: 1
inference_per_block: False
# For 70B model eval, bs can be set to 20, and inference_per_block can be set to True.
# For 7B / 13B model eval, bs can be set to 1, and inference_per_block can be set to False.
seq_len: 2048
quant:
method: Awq
weight:
bit: 4
symmetric: True
granularity: per_group
group_size: 128
calib_algo: learnable
special:
trans: True
trans_version: v2
weight_clip: True
clip_version: v2
save_scale: True
scale_path: ./save/qwen2-7b-instruct-awq_w4a16-lightllm-best/scale
save_clip: True
clip_path: ./save/qwen2-7b-instruct-awq_w4a16-lightllm-best/clip
save:
save_trans: True
save_quant: False
save_lightllm: False
save_path: ./save/qwen2-7b-instruct-awq_w4a16-lightllm-best/trans
Did you get the error with lightllm quantization mode after fixing the config for llmc? If you do not use quantization kernel, weight clipping in awq makes this reasonable.
I tried with/without "--mode triton_w4a16" to start lightllm, both get the same error
We will try to reproduce the error later. Just wait for some time. You can also try other algorithms.
I get the same error for QuaRot
base:
seed: &seed 42
model:
type: Qwen2
path: /models/Qwen2-7B-Instruct
tokenizer_mode: slow
torch_dtype: auto
eval:
eval_pos: [fake_quant]
name: wikitext2
download: False
path: /app/src/llmc/tools/data/eval/wikitext2
bs: 1
inference_per_block: False
seq_len: 2048
quant:
method: Quarot
weight:
bit: 4
symmetric: False
granularity: per_channel
group_size: -1
qmax_to_tensor: True
calib_algo: minmax
act:
bit: 16
symmetric: False
granularity: per_token
qmax_to_tensor: True
special:
rotate_mode: hadamard
fp32_had: True
online_rotate: False
save:
save_trans: True
save_fake: False
save_path: ./save/qwen2-7b-instruct-quarot_w4a16/trans
awq config
start with lightllm
test
get error in lightllm
PS: I get similar error if start lightllm with option "--mode triton_w4a16"