Implementation detail of Quantizaiton?

llCurious commented 2 years ago

Hey, your work is well-presented and i just wonder one detail:

How do you ensure that the input to your quantization function is in the range [0,1]?

As you mentioned in models/quant_ops (this link), do you require that the input is normalized in advance?

deJQK commented 2 years ago

Hi @llCurious , thanks for your interest in our work and sorry for late reply. You are right, the input to the quantization function q_k is always normalized to [0, 1] (see weight normalization and input normalization ) and its output is also quantized value in [0, 1] (see here). Afterwards the quantized value will be scaled back by dequantization for input (see input rescaling) or weight if rescale is True (see weight rescaling).

llCurious commented 2 years ago

Thanks for your replay.

So the normalization is applied in the first to ensure the inputs in the range [0,1]. Since you also mention that after quantization, rescaling is required to de-quantization, I wonder whether de-normalization is required to erase the effect of normalization?

Since the raw weight/input may have much larger range. E.g. the input (maybe the output of some FC layer) can be significantly large if the size of the neuron for this layer is like 128 and the data dimension is 1,000. In this case, after normalization, the magnitude seems to change a lot.

In addition, could you elaborate on the backward for such quantization or point out if there is elaboration in your paper?
I also read some related papers that work on quantization. They seem to use cliping rather than normalization to constrain the input range. Why do you choose such scheme?

deJQK commented 2 years ago

Hi @llCurious , I do not see difference between denormalization and dequantization. We also clip the input before scaling and quantization.

llCurious commented 2 years ago

De-Normalization: the input to q_k is firstly normalized into [-1,1] using non-linear transformation weight = torch.tanh(self.weight) / torch.max(torch.abs(torch.tanh(self.weight))). Do you mean the de-quantization you mentioned above is used to erase the effect of this step (normalization to me)?

By the way, what is the underlying data type for the whole quantization procedure? It seems to be f32 rather than int8. The de-quantization above also multiply the quantized weight by weight_scale which is a float number as well.

deJQK / AdaBits

Implementation detail of Quantizaiton? #2