deJQK / AdaBits

38 stars 6 forks source link

Implementation detail of Quantizaiton? #2

Open llCurious opened 2 years ago

llCurious commented 2 years ago

Hey, your work is well-presented and i just wonder one detail:

How do you ensure that the input to your quantization function is in the range [0,1]?

As you mentioned in models/quant_ops (this link), do you require that the input is normalized in advance?

deJQK commented 2 years ago

Hi @llCurious , thanks for your interest in our work and sorry for late reply. You are right, the input to the quantization function q_k is always normalized to [0, 1] (see weight normalization and input normalization ) and its output is also quantized value in [0, 1] (see here). Afterwards the quantized value will be scaled back by dequantization for input (see input rescaling) or weight if rescale is True (see weight rescaling).

llCurious commented 2 years ago

Thanks for your replay.

Since the raw weight/input may have much larger range. E.g. the input (maybe the output of some FC layer) can be significantly large if the size of the neuron for this layer is like 128 and the data dimension is 1,000. In this case, after normalization, the magnitude seems to change a lot.

deJQK commented 2 years ago

Hi @llCurious , I do not see difference between denormalization and dequantization. We also clip the input before scaling and quantization.

llCurious commented 2 years ago

By the way, what is the underlying data type for the whole quantization procedure? It seems to be f32 rather than int8. The de-quantization above also multiply the quantized weight by weight_scale which is a float number as well.