OpenGVLab / OmniQuant

[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.
MIT License
663 stars 50 forks source link

Reduce shape for per group weight calibration #24

Closed Alvant closed 10 months ago

Alvant commented 10 months ago

Hello!

Am I right that when we quantize weights with some group size, we expect calibration stats (min and max) to be the same within each group? If so, why the reduce_shape is set to -1 here:

https://github.com/OpenGVLab/OmniQuant/blob/main/quantize/quantizer.py#L130C28-L130C28

x = x.reshape(-1,self.group_size)

# some code omitted

reduce_shape = [-1]
xmin = x.amin(reduce_shape, keepdim=True)
xmax =  x.amax(reduce_shape, keepdim=True)

Should not the reduce_shape param be equal to 0 if x is a weight matrix? if, on the other hand, x is an input tensor, than reduce_shape should indeed be -1?

P.S. If some kind of fix is really needed, I would be happy to try make a pull request :slightly_smiling_face:

ChenMnZ commented 10 months ago

Thanks for your proposal.

However, I think that the code is right.

For weight, after reshape, the shape is [number_of_group, group_size]. For activation, the shape is [number_of_token, embedding_channel].

So if we want to do per-group quantization for weight or per-token quantization for activation, reduce_shape should be set as [-1].

Alvant commented 10 months ago

Ah, okay! It appears I did not quite understand the meaning of "group_size". I thought that it means the size in terms of groups (i.e. number of groups), not the size of one group. Then everything is indeed alright. Sorry for disturbance)