Open SlavikMIPT opened 7 months ago
@Torrero
@jinevening Can you please give your opinion, about this feature?
Could you explain the background of this work? Are you going to support int4 quantization? Or is it for better 8 bit quantization?
Could you explain the background of this work? Are you going to support int4 quantization? Or is it for better 8 bit quantization?
I am planning to support int4, but this method can be applied to 8 bit quantization and improve accuracy
So, your main goal is to implement int4 quantization. May I ask what is its use case? What is the target backend?
So, your main goal is to implement int4 quantization. May I ask what is its use case? What is the target backend?
Microcontrollers - int4 will allow to reduce binary size of models
Do you have any target model or results about accuracy? one-quantize
's quantization algorithm does not work well in int4.
Do you have any target model or results about accuracy?
one-quantize
's quantization algorithm does not work well in int4.
I made some tests - here is result:
[0.0027225911617279053, 0.18983474373817444, 0.41336789727211, -0.18240013718605042, 0.2007804811000824,-0.19718044996261597, -0.06138917803764343, -0.12958911061286926,-0.12484398484230042, -0.4296090304851532,-0.3490271270275116, 0.3468421995639801, 0.3684578835964203, 0.18269559741020203, 0.23875799775123596, -0.2323986440896988]
q = 0.06137271970510483
[0, 3, 7, -3, 3, -3, -1, -2, -2, -7, -6, 6, 6, 3, 4, -4]
q = 0.0033827482257038355
[1, 56, 122, -54, 59, -58, -18, -38, -37, -127, -103, 103, 109, 54, 71, -69]
int4_mse = 0.010758214646791673
int8_mse = 0.0008263211069635848
q = 0.060275191359340924
[0, 3, 7, -3, 3, -3, -1, -2, -2, -7, -6, 6, 6, 3, 4, -4]
q = 0.003798802578209628
[1, 50, 109, -48, 53, -52, -16, -34, -33, -113, -92, 91, 97, 48, 63, -61]
int4_mse_opt = 0.009682772748301009
int8_mse_opt = 0.0005851636840414105
So we got more precise approximation for free and difference in this case is more significant for 8 bit quantization (41% MSE improvement), for 4 bit quantization MSE improvement is 11% for this test model
I think we should implement first this optimization algorithm in quantizer - it significantly improves precision of values representation without overhead
Tested #12582 on tflite micro hello_world example model, got following results: | average deviation % (compared to float model) | |
---|---|---|
int8 current | 0.34% | |
int8 mse | 0.48% | |
int4 current | 8.3% | |
int4 mse | 4% |
As we can see - for int8 we got some degradation on this model(but on some models I got improvement), for int4 - I got significant improvement using minimum mse quantization approach. So I offer to add this algorithm as an option as a first step. @jinevening what do you think?
It looks fine to add MSE algorithm, but we may need some refactoring.
It seems that #12582 does not implement int4, how did you test int4?
It looks fine to add MSE algorithm, but we may need some refactoring.
It seems that #12582 does not implement int4, how did you test int4?
I hardcoded int8 version and limited it with 4bit for proof of concept
Could you make a full draft before landing PRs? That is how we typically work when introducing a new feature.
First, make a full draft without considering code quality too much and measure the benefit from the new feature. Here, "full draft" means that others can reproduce the result with the draft. "benefit" in this case would be the performance improvement on microcontroller.
Second, discuss how to land the draft.
Third, review and merge.
Microcontrollers - int4 will allow to reduce binary size of models
CC @chunseoklee
What
Let's support calculating of scale using MSE minimization (see part 3.4).
Why
We can improve accuracy of quantized models (especially low-bit quantized) by finding optimal approximation of tensor (calculating scale factor using MSE minimization approach)
How
TBD: