PINTO0309 / onnx2tf

Self-Created Tools to convert ONNX files (NCHW) to TensorFlow/TFLite/Keras format (NHWC). The purpose of this tool is to solve the massive Transpose extrapolation problem in onnx-tensorflow (onnx-tf). I don't need a Star, but give me a pull request.
MIT License
632 stars 62 forks source link

[BERT-Squad] INT8 quantization: The input data type must be Float32. #248

Closed InputBlackBoxOutput closed 1 year ago

InputBlackBoxOutput commented 1 year ago

Issue Type

Others

onnx2tf version number

1.7.25

onnx version number

1.13.1

tensorflow version number

2.12.0rc1

Download URL for ONNX

https://github.com/onnx/models/blob/main/text/machine_comprehension/bert-squad/model/bertsquad-12.onnx

Parameter Replacement JSON

None

Description

Hi,

I am trying to convert and INT8 quantize a BERT ONNX model. I am using the following command on my setup on Google Colab.

onnx2tf --output_integer_quantized_tflite -i {MODEL}.onnx -b 1 > {MODEL}.log

Ouput:

Model convertion started
============================================================

ERROR: For INT8 quantization, the input data type must be Float32. Also, if --quant_calib_input_op_name_np_data_path is not specified, all input OPs must assume 4D tensor image data. INPUT Name: unique_ids_raw_output___9:0 INPUT Shape: ['unk__492'] INPUT dtype: int64

I believe the model has int64 as the input datatype, causing the onnx2tf to fail. Is there a workaround for this ?

image

Thanks for creating such a fantastic tool!

InputBlackBoxOutput commented 1 year ago

Please guide me on how to work around the Auto Calibration Check. I do not need the model to be accurate. I only need the model for hardware profiling for latency.

https://github.com/PINTO0309/onnx2tf/blob/df183f18dcb02265cb6af7444f8f719643c5311d/onnx2tf/onnx2tf.py#L722-L733

PINTO0309 commented 1 year ago

If you do not need to perform INT8 quantization with this tool alone, the following method is the easiest.

The -osd option will output a saved_model.pb in the saved_model folder with the full size required for quantization. That is, a default signature named serving_default is embedded in .pb.

onnx2tf -i bertsquad-12.onnx -b 1 -osd
saved_model_cli show --dir saved_model/ --all

MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:

signature_def['__saved_model_init_op']:
  The given SavedModel SignatureDef contains the following input(s):
  The given SavedModel SignatureDef contains the following output(s):
    outputs['__saved_model_init_op'] tensor_info:
        dtype: DT_INVALID
        shape: unknown_rank
        name: NoOp
  Method name is: 

signature_def['serving_default']:
  The given SavedModel SignatureDef contains the following input(s):
    inputs['input_ids'] tensor_info:
        dtype: DT_INT64
        shape: (1, 256)
        name: serving_default_input_ids:0
    inputs['input_mask'] tensor_info:
        dtype: DT_INT64
        shape: (1, 256)
        name: serving_default_input_mask:0
    inputs['segment_ids'] tensor_info:
        dtype: DT_INT64
        shape: (1, 256)
        name: serving_default_segment_ids:0
    inputs['unique_ids_raw_output___9'] tensor_info:
        dtype: DT_INT64
        shape: (1)
        name: serving_default_unique_ids_raw_output___9:0
  The given SavedModel SignatureDef contains the following output(s):
    outputs['unique_ids_0'] tensor_info:
        dtype: DT_INT64
        shape: (1)
        name: PartitionedCall:0
    outputs['unstack_0'] tensor_info:
        dtype: DT_FLOAT
        shape: (1, 256)
        name: PartitionedCall:1
    outputs['unstack_1'] tensor_info:
        dtype: DT_FLOAT
        shape: (1, 256)
        name: PartitionedCall:2
  Method name is: tensorflow/serving/predict
The MetaGraph with tag set ['serve'] contains the following ops: {'RestoreV2', 'Tanh', 'Sub', 'FloorMod', 'Sqrt', 'Cast', 'Const', 'MergeV2Checkpoints', 'NoOp', 'GatherV2', 'Reshape', 'Select', 'Pack', 'ExpandDims', 'BatchMatMulV2', 'SaveV2', 'MatMul', 'Pow', 'ShardedFilename', 'StringJoin', 'Less', 'PartitionedCall', 'Softmax', 'Placeholder', 'Split', 'StaticRegexFullMatch', 'Mean', 'Squeeze', 'StridedSlice', 'OneHot', 'ConcatV2', 'Transpose', 'Identity', 'Reciprocal', 'StatefulPartitionedCall', 'AddV2', 'Mul', 'Fill'}

Next, simply follow the official tutorial to write and run a few lines of quantization source code. https://www.tensorflow.org/lite/performance/post_training_quantization

import tensorflow as tf

def representative_dataset():
  for data in dataset:
    yield {
      "unique_ids_raw_output___9": data.unique_id,
      "segment_ids": data.segment_id,
      "input_mask": data.mask,
      "input_ids": data.input_id,
    }

converter = tf.lite.TFLiteConverter.from_saved_model('saved_model')
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_dataset
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.int8  # or tf.uint8
converter.inference_output_type = tf.int8  # or tf.uint8
tflite_quant_model = converter.convert()
with open('saved_model/int8_model.tflite', 'wb') as w:
  w.write(tflite_quant_model)

It will be by far easier to understand than reading my messy source code. Note that the above sample code has not been tested. If an error occurs anywhere, please modify it yourself and try again.

Ref: https://github.com/PINTO0309/onnx2tf/issues/222

InputBlackBoxOutput commented 1 year ago

Hi @PINTO0309, I got it working. Heres what I did:

import tensorflow as tf
import numpy as np

# Output of command: saved_model_cli show --dir saved_model/ --all
  # The given SavedModel SignatureDef contains the following input(s):
  #   inputs['input_ids_0'] tensor_info:
  #       dtype: DT_INT64
  #       shape: (1, 256)
  #       name: serving_default_input_ids_0:0
  #   inputs['input_mask_0'] tensor_info:
  #       dtype: DT_INT64
  #       shape: (1, 256)
  #       name: serving_default_input_mask_0:0
  #   inputs['segment_ids_0'] tensor_info:
  #       dtype: DT_INT64
  #       shape: (1, 256)
  #       name: serving_default_segment_ids_0:0
  #   inputs['unique_ids_raw_output___9_0'] tensor_info:
  #       dtype: DT_INT64
  #       shape: (1)
  #       name: serving_default_unique_ids_raw_output___9_0:0

def representative_dataset():
    yield {
      'input_ids_0': np.array([1 for i in range(256)]),
      'input_mask_0': np.array([1 for i in range(256)]),
      'segment_ids_0': np.array([1 for i in range(256)]),
      'unique_ids_raw_output___9_0': np.array([1]),
    }

converter = tf.lite.TFLiteConverter.from_saved_model('saved_model')
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_dataset
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.int8  # or tf.uint8
converter.inference_output_type = tf.int8  # or tf.uint8
tflite_quant_model = converter.convert()

with open('saved_model/int8_model.tflite', 'wb') as w:
  w.write(tflite_quant_model)

Thanks for the help!

MrRace commented 1 year ago

@InputBlackBoxOutput From you representative_dataset

def representative_dataset():
    yield {
      'input_ids_0': np.array([1 for i in range(256)]),
      'input_mask_0': np.array([1 for i in range(256)]),
      'segment_ids_0': np.array([1 for i in range(256)]),
      'unique_ids_raw_output___9_0': np.array([1]),
    }

Your calibration data is so simple, is that OK?

InputBlackBoxOutput commented 1 year ago

Hi @MrRace I wanted to convert the model for profiling purpose only hence quantization accuracy was not taken into account during conversation. You will have to modify the code to make a correct representative dataset.