Closed NRodion closed 1 year ago
Hi, @NRodion , thanks for your interest in the project!
1) The weight permutation denotes the order in which weights are quantized. Below you can find the paragraph from our paper. identity
means original order of the feature dimensions, and actorder
denotes reordering weights in descending order according to the magnitude of channels.
2) This behaviour is due to the fact that the weights are quantized in permuted order, i.e you get 2^bit
unique values for groupsize
weights if they are in corresponding permutation order. Hence, you are expected to find 2^bit
for quantized values after imposing the same order of channels used throughout quantization (i.e torch.unique(layer.weight.data[idx, perm][:blocksize]
)).
Hi and thank you for the reply.
1) Permutation order for some models and be crucial and actorder
usually makes a difference of order ~0.1 ppl.
2) Yes, we need to save the permutation, so It does incur some additional overhead (as discussed in the AWQ paper).
Ok, thanks for the replies.
torch.unique(layer.weight.data[idx,:blocksize])
for any idx should output not more than 2^bit values for any quantization. It works for the original GPTQ code and it also works if I use identity permutation in your code, but it doesn't work for other permutation options and they consistently haveblocksize
number of values instead of 2^bit. Am I missing something? Outliers can potentially contribute to the total number of unique values, but there cannot beblocksize-2^bit
of them (why quantize at all then). Are you sure that the weight matrix is reconstructed correctly?