Samsung / ONE

On-device Neural Engine
Other
428 stars 157 forks source link

Introduce record-minmax for post-training quantization #1605

Closed jinevening closed 4 years ago

jinevening commented 4 years ago

This issue tracks the status of record-minmax #1537, a tool to embed min/max values of activation tensors to the circle model, for post-training quantization #696 .

What record-minmax does is described below (copied from #696).

Record min-max value of each tensor while running the representative dataset Input: circle model (fp32), representative dataset (format is hdf5) Output: circle model (fp32) where min-max values of tensors are saved in QuantizationParameters. Details: Users run executable named record-minmax, which invokes luci-interpreter to perform inference on the given dataset. Whenever performing inference on each data, record-minmax records the moving average of min and the moving average of max for each tensor. After the whole dataset is fed to the interpreter, the recorded average min-max values are saved in the QuantizationParameters of each tensor in the circle model.

This issue tracks the following items.

1. Pre-processing of input data

2. Driver to run luci-interpreter and record min/max values (record-minmax)

Any suggestions and comments are welcome.

jinevening commented 4 years ago

Format of representative dataset

We use hdf5 file (.h5) to save representative dataset. Hierarchy of the dataset file is as below.

Group "/"
  ㄴGroup "value"
      ㄴGroup <record_idx>
          ㄴDataset <input_idx>

record_idx (rid) : index of the record (dataset file contains multiple records) input_idx (iid) : index of the input (DNN model can have multiple inputs)

Example: 10 records, DNN model requires two inputs, each input has shape (3, 4)

Group "/"
  ㄴGroup "value"
      ㄴGroup "0"
          ㄴDataset "0" Shape (3,4)
          ㄴDataset "1" Shape (3,4)
      ㄴGroup "1"
          ㄴDataset "0" Shape (3,4)
          ㄴDataset "1" Shape (3,4)
       ...
      ㄴGroup "10"
          ㄴDataset "0" Shape (3,4)
          ㄴDataset "1" Shape (3,4)

Once determined, the file format is very difficult to change. Please leave a comment if anyone has questions or suggestions.

s-barannikov commented 4 years ago

I suggest to store raw floats in binary files (not using hdf5), one tensor per file. The same way it is done in luci-value-test. There is no need to store shapes.

For me, using hdf5 or any other "smart" format looks like just an additional layer between the user and the compiler, which does not add to UX and complicates overall design without any benefits.

jinevening commented 4 years ago

complicates overall design without any benefits.

IMHO, using a "smart" format has some some benefits to deal with the diversity of the input files.

For example, hdf5 can express endianness of file datatype. (big endian for IEEE_F32BE, litten endian for IEEE_F32LE). I'm not sure we can handle the input file written with different endianness if we use the raw data format.

Also, users can mistakenly give an input which has a wrong shape but the same size expected by the model. For instance, the model requires a (1,3,4)-shaped input, but the user can accidentally give a (1,4,3)-shaped input. It will be very difficult to debug this case if we use the raw data format.

The same case may happen when the user gives an input with a wrong type but with the same size (int32 vs float32).

luci_value_test is a test conducted in a controlled environment, so it would be ok to ignore such miscellaneous things. But record-minmax will receive inputs from users, so it needs to be capable of handling various input files.

s-barannikov commented 4 years ago

@jinevening What do you mean by diversity of input files? The user won't pass jpg or png into a model. The model expects floats or integers, not pictures. There are many ways to convert an image to floats. For once, you can just divide each channel by 255. But usually some normalization is performed. For instance, keras suggests to use per_image_standardization layer, which does the following: result = (x - mean) / adjusted_stddev, where adjusted_stddev = max(stddev, 1.0/sqrt(N)). In general, how to convert an image to floats/integers is model-dependent, the formula above is just one of possible ways. We do not know what the developer of the model had in mind when he was creating the model.

Let go image classification models. What about NLP models? They also do not accept text. Instead, they consume integers, and only the developer of the model knows how to convert text to integers for a concrete model.

Same for audio processing. You are not going to convert .wav/.mp3/*.ogg and whole lot of other formats into hdf5, are you? 😄 And... how, anyway?

My point is: The user would already have data correctly converted to the format the the model expects, and the only thing they would need to do is to pass the data to record-minmax tool. We only need to provide a convenient interface.


I agree that the user may mistakenly pass wrong data, which will be accepted by record-minmax without error. I don't mind using hdf5 to pass already prepared data to record-minmax tool. I emphasize that it is the user responsibility to prepare the data (e.g. convert image to floats, or text to integers). This prepared data can then be converted to hdf5 format. But please do not ask the user to deal with hdf5 interface, let's provide more convenient one.


We can still provide some helper utilities to do popular conversions, like the kerases mentioned above.

jinevening commented 4 years ago

I don't mind using hdf5 to pass already prepared data to record-minmax tool. I emphasize that it is the user responsibility to prepare the data (e.g. convert image to floats, or text to integers).

I entirely agree that users are responsible for pre-processing their data. We are not going to provide tools to pre-process jpg, png, or audio files (we can provide some popular ones as you mentioned, but basically users should do that). The above hdf5 format is just for saving "already prepared" image data.

jinevening commented 4 years ago

The basic functionality of record-minmax (except test) was implemented. So I ran a test with ResNet50 model.

Test process

  1. Convert ResNet50.tflite to ResNet50.circle (using tflite2circle)

  2. Generate 1,000 random input data (using gen_h5_inputs.py)

  3. Run record-minmax with the generated data

    
    time build/release/compiler/record-minmax/record-minmax ResNet50.circle ResNet50.input.h5 ResNet50.minmax_recorded.circle
    Recording 0'th data
    Recording 100'th data
    Recording 200'th data
    Recording 300'th data
    Recording 400'th data
    Recording 500'th data
    Recording 600'th data
    Recording 700'th data
    Recording 800'th data
    Recording 900'th data

real 3m59.761s user 3m57.615s sys 0m0.400s


ResNet50.circle: 102170664 bytes
ResNet50.minmax_recorded.circle (embeded with min/max values) : 102171700 bytes (1036 bytes increased)

4. Dump the circle model embedded with min/max

build/compiler/circledump/circledump ResNet50.minmax_recorded.circle

... Operands: T(subgraph index : tensor index) TYPE (shape) (shape_signature) B(buffer index) OperandName T(0:0) FLOAT32 (1, 224, 224, 3) B(1) input_1 Quantization: min(3.40922e-06) max(0.999995) T(0:1) INT32 (4, 2) B(2) resnet_v1_50/Pad/Pad/paddings T(0:2) FLOAT32 (1, 230, 230, 3) B(3) resnet_v1_50/Pad/Pad Quantization: min(0) max(0.999995) T(0:3) FLOAT32 (64, 7, 7, 3) B(4) resnet_v1_50/conv1/BatchNorm/mul_1 T(0:4) FLOAT32 (64) B(5) activation/Relu;resnet_v1_50/conv1/BatchNorm/add_1;resnet_v1_50/block1/unit_3/bottleneck_v1/conv2/BatchNorm/mul_1;resnet_v1_50/conv1/BatchNorm/mul_1;resnet_v1_50/conv1/BatchNorm/sub T(0:5) FLOAT32 (1, 112, 112, 64) B(6) activation/Relu;resnet_v1_50/conv1/BatchNorm/add_1;resnet_v1_50/block1/unit_3/bottleneck_v1/conv2/BatchNorm/mul_1;resnet_v1_50/conv1/BatchNorm/mul_1;resnet_v1_50/conv1/BatchNorm/sub1 Quantization: min(0) max(6.32373) T(0:6) FLOAT32 (1, 56, 56, 64) B(7) max_pooling2d/MaxPool Quantization: min(0) max(6.32373) ...



We can see the min/max values are embedded in quantization parameters (only for activation tensors).

**Summary**
`record-minmax` took ~4 minutes for profiling 1,000 data with ResNet50. I've checked the recorded values are embedded in the circle model with my eyes.

**Update 2020/06/26**: I've profiled with InceptionV3 and MobileNetV2. They took 6m40s and 1m, respectively.

The next step is to verify the values of the embedded min/max. #2088
jinevening commented 4 years ago

I close this issue because the tool was successfully merged and the test issue is being discussed in #2088.