Open BalyshevArtem opened 1 year ago
Let's continue this on refactored onert-micro (https://github.com/Samsung/ONE/issues/12427)
FYI, here is the quantization spec for Conv2D and DepthwiseConv ( from https://ai.google.dev/edge/litert/models/quantization_spec?hl=en ) we just refer to Weight spec here
CONV_2D
Input 0:
data_type : int8
range : [-128, 127]
granularity: per-tensor
Input 1 (Weight):
data_type : int8
range : [-127, 127]
granularity: per-axis (dim = 0)
restriction: zero_point = 0
Input 2 (Bias):
data_type : int32
range : [int32_min, int32_max]
granularity: per-axis
restriction: (scale, zero_point) = (input0_scale * input1_scale[...], 0)
Output 0:
data_type : int8
range : [-128, 127]
granularity: per-tensor
DEPTHWISE_CONV_2D
Input 0:
data_type : int8
range : [-128, 127]
granularity: per-tensor
Input 1 (Weight):
data_type : int8
range : [-127, 127]
granularity: per-axis (dim = 3)
restriction: zero_point = 0
Input 2 (Bias):
data_type : int32
range : [int32_min, int32_max]
granularity: per-axis
restriction: (scale, zero_point) = (input0_scale * input1_scale[...], 0)
Output 0:
data_type : int8
range : [-128, 127]
granularity: per-tensor
What
Let's support weight quantize kernels implementation. This approach implies that the model has a float type, but at the same time some operations have quantized weights, so it is hybrid kernel.
Why
To reduce binary size for some target model.
How
Support if for: