Closed siahuat0727 closed 5 months ago
I found an example here and it works!
https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/06a155388edd4a240051176a67a11886b15db082/llm_ptq/hf_ptq.py#L150
But I noticed that setting alpha != 1
in SmoothQuant leads to different scales for qkv and some linear layers, which seems to prevent fusion with the previous norm layer. Shouldn't these layers have the same smooth scale for proper fusion?
Is this a bug or am I misunderstanding something?
Thanks!
with alpha!=1, qkv will have different pre-quant scaling factors and we do a postprocess to resmooth it, so not a bug This also happens to AWQ.
Thanks! Clears things up on rescaling for alpha!=1. Does modelopt handle the rescaling internally? Ideally, I'd love to see an example of how to grab those resmoothed rescaling factors. @RalphMao
@siahuat0727 modelopt handles the rescaling internally during tensorrtllm checkpoint export .
There are no public examples which showcase this.
Hi, I wonder is it possible to choose different alpha for mtq.INT8_SMOOTHQUANT_CFG?