Closed InputBlackBoxOutput closed 1 year ago
Please guide me on how to work around the Auto Calibration Check. I do not need the model to be accurate. I only need the model for hardware profiling for latency.
If you do not need to perform INT8 quantization with this tool alone, the following method is the easiest.
The -osd
option will output a saved_model.pb
in the saved_model
folder with the full size required for quantization. That is, a default signature named serving_default
is embedded in .pb
.
onnx2tf -i bertsquad-12.onnx -b 1 -osd
saved_model_cli show --dir saved_model/ --all
MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:
signature_def['__saved_model_init_op']:
The given SavedModel SignatureDef contains the following input(s):
The given SavedModel SignatureDef contains the following output(s):
outputs['__saved_model_init_op'] tensor_info:
dtype: DT_INVALID
shape: unknown_rank
name: NoOp
Method name is:
signature_def['serving_default']:
The given SavedModel SignatureDef contains the following input(s):
inputs['input_ids'] tensor_info:
dtype: DT_INT64
shape: (1, 256)
name: serving_default_input_ids:0
inputs['input_mask'] tensor_info:
dtype: DT_INT64
shape: (1, 256)
name: serving_default_input_mask:0
inputs['segment_ids'] tensor_info:
dtype: DT_INT64
shape: (1, 256)
name: serving_default_segment_ids:0
inputs['unique_ids_raw_output___9'] tensor_info:
dtype: DT_INT64
shape: (1)
name: serving_default_unique_ids_raw_output___9:0
The given SavedModel SignatureDef contains the following output(s):
outputs['unique_ids_0'] tensor_info:
dtype: DT_INT64
shape: (1)
name: PartitionedCall:0
outputs['unstack_0'] tensor_info:
dtype: DT_FLOAT
shape: (1, 256)
name: PartitionedCall:1
outputs['unstack_1'] tensor_info:
dtype: DT_FLOAT
shape: (1, 256)
name: PartitionedCall:2
Method name is: tensorflow/serving/predict
The MetaGraph with tag set ['serve'] contains the following ops: {'RestoreV2', 'Tanh', 'Sub', 'FloorMod', 'Sqrt', 'Cast', 'Const', 'MergeV2Checkpoints', 'NoOp', 'GatherV2', 'Reshape', 'Select', 'Pack', 'ExpandDims', 'BatchMatMulV2', 'SaveV2', 'MatMul', 'Pow', 'ShardedFilename', 'StringJoin', 'Less', 'PartitionedCall', 'Softmax', 'Placeholder', 'Split', 'StaticRegexFullMatch', 'Mean', 'Squeeze', 'StridedSlice', 'OneHot', 'ConcatV2', 'Transpose', 'Identity', 'Reciprocal', 'StatefulPartitionedCall', 'AddV2', 'Mul', 'Fill'}
Next, simply follow the official tutorial to write and run a few lines of quantization source code. https://www.tensorflow.org/lite/performance/post_training_quantization
import tensorflow as tf
def representative_dataset():
for data in dataset:
yield {
"unique_ids_raw_output___9": data.unique_id,
"segment_ids": data.segment_id,
"input_mask": data.mask,
"input_ids": data.input_id,
}
converter = tf.lite.TFLiteConverter.from_saved_model('saved_model')
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_dataset
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.int8 # or tf.uint8
converter.inference_output_type = tf.int8 # or tf.uint8
tflite_quant_model = converter.convert()
with open('saved_model/int8_model.tflite', 'wb') as w:
w.write(tflite_quant_model)
It will be by far easier to understand than reading my messy source code. Note that the above sample code has not been tested. If an error occurs anywhere, please modify it yourself and try again.
Hi @PINTO0309, I got it working. Heres what I did:
import tensorflow as tf
import numpy as np
# Output of command: saved_model_cli show --dir saved_model/ --all
# The given SavedModel SignatureDef contains the following input(s):
# inputs['input_ids_0'] tensor_info:
# dtype: DT_INT64
# shape: (1, 256)
# name: serving_default_input_ids_0:0
# inputs['input_mask_0'] tensor_info:
# dtype: DT_INT64
# shape: (1, 256)
# name: serving_default_input_mask_0:0
# inputs['segment_ids_0'] tensor_info:
# dtype: DT_INT64
# shape: (1, 256)
# name: serving_default_segment_ids_0:0
# inputs['unique_ids_raw_output___9_0'] tensor_info:
# dtype: DT_INT64
# shape: (1)
# name: serving_default_unique_ids_raw_output___9_0:0
def representative_dataset():
yield {
'input_ids_0': np.array([1 for i in range(256)]),
'input_mask_0': np.array([1 for i in range(256)]),
'segment_ids_0': np.array([1 for i in range(256)]),
'unique_ids_raw_output___9_0': np.array([1]),
}
converter = tf.lite.TFLiteConverter.from_saved_model('saved_model')
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_dataset
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.int8 # or tf.uint8
converter.inference_output_type = tf.int8 # or tf.uint8
tflite_quant_model = converter.convert()
with open('saved_model/int8_model.tflite', 'wb') as w:
w.write(tflite_quant_model)
Thanks for the help!
@InputBlackBoxOutput From you representative_dataset
def representative_dataset():
yield {
'input_ids_0': np.array([1 for i in range(256)]),
'input_mask_0': np.array([1 for i in range(256)]),
'segment_ids_0': np.array([1 for i in range(256)]),
'unique_ids_raw_output___9_0': np.array([1]),
}
Your calibration data is so simple, is that OK?
Hi @MrRace I wanted to convert the model for profiling purpose only hence quantization accuracy was not taken into account during conversation. You will have to modify the code to make a correct representative dataset.
Issue Type
Others
onnx2tf version number
1.7.25
onnx version number
1.13.1
tensorflow version number
2.12.0rc1
Download URL for ONNX
https://github.com/onnx/models/blob/main/text/machine_comprehension/bert-squad/model/bertsquad-12.onnx
Parameter Replacement JSON
Description
Hi,
I am trying to convert and INT8 quantize a BERT ONNX model. I am using the following command on my setup on Google Colab.
Ouput:
I believe the model has int64 as the input datatype, causing the onnx2tf to fail. Is there a workaround for this ?
Thanks for creating such a fantastic tool!