Samsung / ONE

On-device Neural Engine
Other
440 stars 157 forks source link

[onert-micro] Support Weight-Quantize Kernels #11774

Open BalyshevArtem opened 1 year ago

BalyshevArtem commented 1 year ago

What

Let's support weight quantize kernels implementation. This approach implies that the model has a float type, but at the same time some operations have quantized weights, so it is hybrid kernel.

Why

To reduce binary size for some target model.

How

Support if for:

chunseoklee commented 2 weeks ago

Let's continue this on refactored onert-micro (https://github.com/Samsung/ONE/issues/12427)

FYI, here is the quantization spec for Conv2D and DepthwiseConv ( from https://ai.google.dev/edge/litert/models/quantization_spec?hl=en ) we just refer to Weight spec here

CONV_2D
  Input 0:
    data_type  : int8
    range      : [-128, 127]
    granularity: per-tensor
  Input 1 (Weight):
    data_type  : int8
    range      : [-127, 127]
    granularity: per-axis (dim = 0)
    restriction: zero_point = 0
  Input 2 (Bias):
    data_type  : int32
    range      : [int32_min, int32_max]
    granularity: per-axis
    restriction: (scale, zero_point) = (input0_scale * input1_scale[...], 0)
  Output 0:
    data_type  : int8
    range      : [-128, 127]
    granularity: per-tensor

DEPTHWISE_CONV_2D
  Input 0:
    data_type  : int8
    range      : [-128, 127]
    granularity: per-tensor
  Input 1 (Weight):
    data_type  : int8
    range      : [-127, 127]
    granularity: per-axis (dim = 3)
    restriction: zero_point = 0
  Input 2 (Bias):
    data_type  : int32
    range      : [int32_min, int32_max]
    granularity: per-axis
    restriction: (scale, zero_point) = (input0_scale * input1_scale[...], 0)
  Output 0:
    data_type  : int8
    range      : [-128, 127]
    granularity: per-tensor