FULL INTEGER QUANTIZED MODEL (INT8) infers always the same value

CarlosNacher commented 5 months ago

Issue Type

Others

OS

Windows

onnx2tf version number

1.19.16

onnx version number

1.15.0

onnxruntime version number

1.17.1

onnxsim (onnx_simplifier) version number

0.4.33

tensorflow version number

2.15.0

Download URL for ONNX

https://we.tl/t-HJjXDINMND

Parameter Replacement JSON

None

Description

When I try full integer quantization, the inference always return the same value, that is, the quantized model is losing all the accuracy and I don't know why. It has to be one reason / one thing I am not doing properly, because if it will lose some accuracy it would be understandable, but always return the same value is not normal.

Have you any idea of what could be happening? :/ Thank you so much for your library!!

My code:

################### LOAD ONNX MODEL
ort_session = onnxruntime.InferenceSession(ONNX_MODEL_PATH, providers=["CPUExecutionProvider"])

def reshape_to_onnx(arr):
    arr = np.moveaxis(arr, 2, 0) # HWC to CHW
    arr = np.expand_dims(arr, axis=0) # add batch axis (=1) -> BCHW
    return arr

# compute ONNX Runtime output prediction
ort_inputs = {ort_session.get_inputs()[0].name: reshape_to_onnx(pre_processed_input)}
ort_outs = ort_session.run(None, ort_inputs)

############## CREATE QUANTIZED MODEL
onnx2tf.convert(
    input_onnx_file_path=ONNX_MODEL_PATH,
    output_folder_path=TF_MODEL_PATH,
    overwrite_input_shape=overwrite_input_shape,
    output_signaturedefs=True,
    output_integer_quantized_tflite=True, # INT QUANTIZATION
    quant_type="per_channel", # per channel instead of per entire tensor, so more granularity
    input_output_quant_dtype="int8",
    custom_input_op_name_np_data_path=[[list(ort_inputs.keys())[0],  calib_save_path,[[[[0.485, 0.456, 0.406]]]],[[[[0.229, 0.224, 0.225]]]]]], # as we have saved data already normalized, we pass mean=0 and std=1
    check_onnx_tf_outputs_elementwise_close_full=True # check each layer output of .onnx model and tf float32 model match
)

#################### INFERENCE
# Initialize the interpreter
interpreter = tf.lite.Interpreter(
    model_path=str(TFLITE_FILE), 
    # experimental_preserve_all_tensors=True
    )
interpreter.allocate_tensors()

input_details = interpreter.get_input_details()[0]
output_details = interpreter.get_output_details()[0]

test_image = pre_processed_input.copy()

# Check if the input type is quantized, then rescale input data to int8
if input_details['dtype'] == np.int8:
    input_scale, input_zero_point = input_details["quantization"]
    test_image = test_image / input_scale + input_zero_point
test_image = np.expand_dims(test_image, axis=0).astype(input_details["dtype"])

interpreter.set_tensor(input_details["index"], test_image)
interpreter.invoke()
tflite_inference = interpreter.get_tensor(output_details["index"])[0]

if output_details["dtype"] == np.int8:
    output_scale, output_zero_point = output_details['quantization']
    print("Output scale:", output_scale)
    print("Output zero point:", output_zero_point)
    print()
    tflite_inference = output_scale * (tflite_inference.astype(np.float32) - output_zero_point)

DATA I AM USING FOR CALIB AND ALSO FOR TRYING INFERENCE (pre_processed_input): https://we.tl/t-FhSDHicE4v and the preprocessing is resizing to 1024, 608 and normalizing / 255

The tflite_inference has the same repeated value (0.006) :/

PINTO0309 commented 5 months ago

Your model contains a normalization layer, so you should not normalize the quantization calibration data. In addition, input data should not be normalized during inference. All of the above. I also answered this in my last issue.

Duplicate of https://github.com/PINTO0309/onnx2tf/issues/607 Duplicate of https://github.com/PINTO0309/onnx2tf/issues/611

CarlosNacher commented 5 months ago

Your model contains a normalization layer, so you should not normalize the quantization calibration data. In addition, input data should not be normalized during inference. All of the above. I also answered this in my last issue.

I dont't which type of normalization are Sub-Div, and I don't understand that my model could have a normalization layer, since I created the model from scratch and it hadn't norm layer. I created my model in torch, and always passed the input previuosly preprocessing it (resize + norm / 255 and imagenet's mean and std), then I exported my model to onnx and tried the inference, also always previously preprocessing it and thus with the keras model returned by onnx2tf.convert (preprocessing it). So, why now in tflite model shouldn't I do it? and why you say it has norm layer?

I am pretty sure it needs that preprocessing, because besides, when I use the scale_factor and zero_point stored in interpreter.get_input_details()[0] to normalize my preprocessed data, it converts to -127, 127 range, and if not preprocessed it takes strange values. So, I think preprocessing is okay and then I don't know why the integer quantized model returns always same value :/

Thank you in advance.

Duplicate of #607 Duplicate of #611

Oh, does that mean that if my onnx file is the same I should continue responding in the same initial issue even if the main topic is not the same? I opened another issues to try to keep different topics independent, but if I should do it in another way, please let me know to follow best practices and accept my apologies if I have done it wrong.

PINTO0309 / onnx2tf