Open shaoyanguo opened 1 week ago
Hi @shaoyanguo, thanks for raising this issue. We have a plan to address this issue in a future release. As the current implementation hardcodes the exclude_modules
, you can manually update the config file as a workaround, e.g., adding transformer.layers.0.attention.qkv
.
During the quantization of llama13b, I modified the config to:
quant_cfg["quant_cfg"]["*self_attn*"] = {'enable':False}
. However, in the generated config file, under group = 0, "exclude_modules" still only has "lm_head". What should I do? "quantization": { "quant_algo": "W4A16_AWQ", "kv_cache_quant_algo": null, "group_size": 0, "has_zero_point": false, "pre_quant_scale": true, "exclude_modules": [ "lm_head" ] },