Closed xuyaojian123 closed 10 months ago
Sorry for the late reply. The function quantizes inputs based on the max and min of inputs and the target bit precision.
The function quantizes inputs based on the max and min
Thanks so much for your reply, I still don't understand this function.
a = tensor([0.9372, 2.3312, 3.2122, 0.5491, 5.2983, 1.5295, 0.8926, 2.7871, 3.0447,4.5377])
b = tensor([0.9178, 2.3363, 3.2123, 0.5423, 5.2983, 1.5436, 0.8761, 2.7952, 3.0455,4.5473])
So min_max_quantize
function return results b
is for what? I don't know why you turned a
into b
. Can you explain it in detail? Sorry, I have very little knowledge of quantization.
Sorry for the late reply. The function quantizes inputs based on the max and min of inputs and the target bit precision.
Looking forward to your reply
Sorry for being late. You can think of the function as follows: First, the function finds the min and max of the inputs. Given the number of bits, in your example eight, the function finds the equally spaced 2**n_bit numbers between the min and max values. Since you use 8 as the target number of bits, the function will use 256 numbers. These can be deemed quantized numbers. And then, for each input number, the function outputs the nearest quantized number.
As this is only one of the quantization methods, you can easily try other quantization methods. I hope my explanation helps. If you have further questions, feel free to ask.
The function quantizes inputs based on the max and min
Thanks so much for your reply, I still don't understand this function.
a = tensor([0.9372, 2.3312, 3.2122, 0.5491, 5.2983, 1.5295, 0.8926, 2.7871, 3.0447,4.5377])
b = tensor([0.9178, 2.3363, 3.2123, 0.5423, 5.2983, 1.5436, 0.8761, 2.7952, 3.0455,4.5473])
So
min_max_quantize
function return resultsb
is for what? I don't know why you turneda
intob
. Can you explain it in detail? Sorry, I have very little knowledge of quantization.
Thanks your reply. The purpose of quantization is to reduce model size. But when converting variable a
into variable b
, they are both of torch.float32
type. Where it reflect the effect of reducing model size.
Yes. It does return 32-bit floating points. But since they are quantized, which means there are only 2**n_bit numbers possible, they can be encoded into n_bit representations. That is, even though the code uses floating points, we only need n_bits for each parameter for representation, as long as we know the min and max values and the number of bits. In addition, using the min and max values and the number of bits, you can always restore n_bit representations back to 32-bit floating points.
Thanks, I understand.
Hello,i am come again. What's does the function of
min_max_quantize
.