Model gives inaccurate results post conversion to tflite

AD-lite24 commented 2 months ago

Issue Type

Others

OS

Linux

onnx2tf version number

1.25.7

onnx version number

1.16.2

onnxruntime version number

1.18.1

onnxsim (onnx_simplifier) version number

0.4.36

tensorflow version number

2.17.0

Download URL for ONNX

https://huggingface.co/onnx-community/metric3d-vit-small/blob/main/onnx/model.onnx

Parameter Replacement JSON

N/A

Description

To deploy a monodepth model on edge devices. R&D work and problem exploration. Massive impact since nothing except tflite seems to work with snapdragon SOCs
The model outputs are not correct at all. I did a lot of inspection and here are my findings

Here are the input details for the tflite conversion

input_details = [{
    'name': 'pixel_values',
    'index': 0,
    'shape': np.array([1, 480, 640, 3], dtype=np.int32),
    'shape_signature': np.array([1, 480, 640, 3], dtype=np.int32),
    'dtype': <class 'numpy.float32'>,
    'quantization': (0.0, 0),
    'quantization_parameters': {
        'scales': np.array([], dtype=np.float32),
        'zero_points': np.array([], dtype=np.int32),
        'quantized_dimension': 0
    },
    'sparsity_parameters': {}
}]

and here are the output details


output_details = [
    {
        'name': 'Identity',
        'index': 1765,
        'shape': np.array([1, 476, 628], dtype=np.int32),
        'shape_signature': np.array([1, 476, 628], dtype=np.int32),
        'dtype': <class 'numpy.float32'>
        'quantization': (0.0, 0),
        'quantization_parameters': {
            'scales': np.array([], dtype=np.float32),
            'zero_points': np.array([], dtype=np.int32),
            'quantized_dimension': 0
        },
        'sparsity_parameters': {}
    },
    {
        'name': 'Identity_1',
        'index': 1785,
        'shape': np.array([1, 3, 476, 628], dtype=np.int32),
        'shape_signature': np.array([1, 3, 476, 628], dtype=np.int32),
        'dtype': <class 'numpy.float32'>,
        'quantization': (0.0, 0),
        'quantization_parameters': {
            'scales': np.array([], dtype=np.float32),
            'zero_points': np.array([], dtype=np.int32),
            'quantized_dimension': 0
        },
        'sparsity_parameters': {}
    },
    {
        'name': 'Identity_2',
        'index': 1784,
        'shape': np.array([1, 476, 628], dtype=np.int32),
        'shape_signature': np.array([1, 476, 628], dtype=np.int32),
        'dtype': <class 'numpy.float32'>,
        'quantization': (0.0, 0),
        'quantization_parameters': {
            'scales': np.array([], dtype=np.float32),
            'zero_points': np.array([], dtype=np.int32),
            'quantized_dimension': 0
        },
        'sparsity_parameters': {}
    }
]

Upon inspection of the onnx file, the onnx version has 3 outputs

Predicted Depth: tensor: float32[batch_size,4floor(3.5floor(height/14)),4floor(3.5floor(width/14))]
predicted_normal: tensor: float32[batch_size,3,4floor(3.5floor(height/14)),4floor(3.5floor(width/14))]
normal_confidence: tensor: float32[batch_size,4floor(3.5floor(height/14)),4floor(3.5floor(width/14))]

So in the tflite file, Identity, Identity_1, and identity_2 corresponds to either one of these. For predicted depth, it could be either identity or identity_2 and I tried both of them but neither give accurate results at all.

Identity gives values in the range of [-1000, -5000] which does not seem accurate for either confidence or depth values while Identity_2 gives values in the range of [10, 50] which seems more reasonable but still not accurate.

I am not sure if I was supposed to follow any pre or post processing steps different from the onnx format. Tflite often has different steps but I don't exactly know what they are.

this is an example for drawing inference from the onnx file which works absolutely fine. Is it the conversion process that broke it or is there something additional I need to do to fix the results?

Please also find the reference to an old issue which helped me with the conversion process.

I also created a colab notebook to make it easier to see the inferences from the onnx file. For the same image bus.jpg the range of values with onnx are [4.7, 24.7]

PINTO0309 commented 2 months ago

As I commented in the previous issue, if you find it troublesome to correct the Transpose, change the input resolution of the model to a fixed resolution. You should start by checking for precision errors yourself first by making sure to include the -cotof option when converting. onnx2tf is imperfect when it comes to converting dynamic tensors.

There are hundreds of issues in this repository with the same question, so it's a good idea to search for the issue first.

It's mentally painful to be asked to answer the same thing over and over again. I might delete the Issues tab soon.

AD-lite24 commented 2 months ago

Really sorry for this. Referring back to the previous issue, the model input resolution was fixed to [1, 3, 480, 640] so I believed the dynamic input size was no longer a problem. with -cotof I can see a pretty bad divergence with an abs error of 4340.93 in the final output. But from this step I thought the input tensors are no longer dynamic and I am reshaping my images according to the fixed tensor input.

Regardless I do not wish to take up anymore of your time, I will try to figure out why the the values are diverging

PINTO0309 commented 2 months ago

Using the -cotof option will probably tell you which ops are missing the conversion.

e.g.

Your model is a ViT model and has huge parameters, so the auto-correction by onnx2tf may be skipped.

https://github.com/PINTO0309/onnx2tf/blob/8a93cff08e3d1907ab90d1008d48595c24a16de5/onnx2tf/utils/common_functions.py#L3843-L3862

The dummy inference function is necessary to automatically correct model conversion errors, but it consumes a large amount of RAM for models with large structures.

https://github.com/PINTO0309/onnx2tf/tree/main?tab=readme-ov-file#3-accuracy-check

onnx2tf -i metric3d-vit-small.onnx -cotof

Your model appears to consume 130GB of RAM for auto-correction. MatMul has too many elements.

onnx2tf seems to make a mysterious and fatal error in the constant calculations in this part. There may be a problem with the calculation of optimizing two consecutive Sub, i.e. y = (200 - x) - 200.

It's probably a bug in the optimization process of arithmetic operations. Sorry for blaming you so much.

PINTO0309 commented 2 months ago

Tests - OK - The error of the final output was less than 1e-4.

PINTO0309 commented 2 months ago

tflite: https://github.com/PINTO0309/onnx2tf/releases/download/1.25.9/metric3d-vit-small.zip Fix: https://github.com/PINTO0309/onnx2tf/releases/tag/1.25.9

AD-lite24 commented 2 months ago

Ah that is amazing. Brilliant as always. Results have improved significantly compared to the gibberish output as before. The values are still not accurate though from real world tests and the high frequency features were missed by the converted file.

Do you know how we can find out how the pre and post processing steps change upon conversion? Maybe the image needs normalisation as well or the output needs some sort of scaling? The onnx range of values were [4.7, 24.7] and the tflite range is [0.68631876, 11.6238165] and with metric depth such disparity matters. This does seem at odds with the calculated error of 1e-4 but I am not sure. I will run some more experiments to figure out the exact disparity.

Anyway this is certainly nothing you should worry about, you have helped me a lot. Maybe the model simply cannot be converted with very high precision. Again thanks a lot for taking out your time and fixing these issues!

PINTO0309 commented 2 months ago

I can't say anything for sure, just guessing how you plan to use the model, but here are some common patterns of loss of accuracy that can occur after conversion to TensorFlow:

If you eventually quantize to INT8/Float16 and use the model, accuracy is likely to degrade if the normalization process is performed at the beginning of the model. This is because the results vary greatly depending on the type of calibration performed during quantization, so including normalization in preprocessing does not necessarily result in a deterioration in accuracy.
AveragePool has to be converted quite roughly because there is practically no OP in TensorFlow that performs the equivalent operation. Therefore, large errors may occur due to differences in padding processing and rounding at the edges of the image. There is a devastating specification difference between PyTorch/ONNX and TensorFlow in padding processing.
- https://github.com/PINTO0309/onnx2tf/issues?q=AveragePool
Certain operations (OPs) have a significant divergence between TensorFlow's internal implementation and ONNX and PyTorch's internal implementation. This may involve minor effects such as the internal rounding of numbers being different between ONNX and TensorFlow, or it may be fatal, with a bug on the TensorFlow side having been left unfixed for a long time.
- https://github.com/PINTO0309/onnx2tf?tab=readme-ov-file#12-if-the-accuracy-of-the-float32-model-degrades-significantly
- https://github.com/PINTO0309/onnx2tf?tab=readme-ov-file#7-if-the-accuracy-of-the-int8-quantized-model-degrades-significantly
Even if you convert the formula into a completely compatible pattern, there may be a large error that cannot be tolerated. This is also an issue with the internal implementation of ONNX.
- https://github.com/PINTO0309/onnx2tf?tab=readme-ov-file#13-problem-of-extremely-large-calculation-error-in-instancenormalization
I always check that the error of the final output converges to around 1e-4, but you may want to check the errors for each OP that are checked by the -cotof option yourself. The reason why it is better to check the accuracy check results of -cotof again is that there may be problems with large errors being rounded or flattened in the OP processing of Softmax or Convoloution, Pooling. However, since the structure of the ViT model is very large, the amount of accuracy check results output by the -cotof option becomes enormous, and it is very difficult to visually check the error check results of all OPs.
TensorFlow Lite's converters sometimes optimize models arbitrarily, most notably by arbitrarily replacing Div with Mul operations. This may seem like a simple operation, replacing division with multiplication of the reciprocal, but in reality, due to an issue with TensorFlow's internal rounding, errors often occur when the result of an OP that was originally expressed as a division is rewritten to be processed with Mul. You can see this by looking closely at the results of converting your ViT model with -cotof. We can see that errors occur in all Div OPs. Sqrt, which has a similar internal behavior, is prone to errors.

TFLiteConverter does not guarantee the order of input and output OPs of models generated by Keras. Therefore, when a model with multiple input OPs or multiple output OPs is converted to tflite, the meaning of the input and output order in ONNX and the meaning of the input and output order in tflite may be randomly swapped. To deal with such a strange specification of TFLiteConverter, which may seem like a bug, onnx2tf implements an option called -coion, which writes an inferable signature into the model using input and output names. By using interpreter.get_signature_runner(), you can match input tensors and output tensors using the model's input and output names, so processing can be performed normally even if the input and output order is broken. https://github.com/PINTO0309/onnx2tf?tab=readme-ov-file#14-inference-with-dynamic-tensors-in-tflite

e.g.

import numpy as np
import tensorflow as tf
from pprint import pprint

interpreter = tf.lite.Interpreter(model_path="saved_model/osnet_x0_25_msmt17_float32.tflite")
tf_lite_model = interpreter.get_signature_runner()
inputs = {
  'images': np.ones([5,256,128,3], dtype=np.float32),
}
tf_lite_output = tf_lite_model(**inputs)
print(f"[TFLite] Model Predictions shape: {tf_lite_output['output'].shape}")
print(f"[TFLite] Model Predictions:")
pprint(tf_lite_output)

Your ONNX and TFLite with `-coion` https://github.com/PINTO0309/onnx2tf/releases/download/1.25.9/metric3d-vit-small-with-coion.zip	ONNX	TFLite

import numpy as np
import tensorflow as tf
from pprint import pprint

interpreter = tf.lite.Interpreter(model_path="metric3d-vit-small_float32.tflite")
tf_lite_model = interpreter.get_signature_runner()
inputs = {
  'pixel_values': np.ones([1,480,640,3], dtype=np.float32),
}
tf_lite_output = tf_lite_model(**inputs)
print(f"[TFLite] Model Predictions shape.1: {tf_lite_output['predicted_depth'].shape}")
print(f"[TFLite] Model Predictions shape.2: {tf_lite_output['predicted_normal'].shape}")
print(f"[TFLite] Model Predictions shape.3: {tf_lite_output['normal_confidence'].shape}")

###### Input/output order is irrelevant
# print(f"[TFLite] Model Predictions shape.1: {tf_lite_output['predicted_normal'].shape}")
# print(f"[TFLite] Model Predictions shape.2: {tf_lite_output['normal_confidence'].shape}")
# print(f"[TFLite] Model Predictions shape.3: {tf_lite_output['predicted_depth'].shape}")

print(f"[TFLite] Model Predictions:")
pprint(tf_lite_output)

Note that onnx2tf fixes all elements to 1 when performing accuracy checks. Do not use real images or test data. This is because the type of input data is not known.

If the error check using the -cotof option with all test data set at 1 converges to an error of around 1e-4, the inference results of your test code should also be within an error of around 1e-4 for all elements.

The onnx range of values were [4.7, 24.7] and the tflite range is [0.68631876, 11.6238165] and with metric depth such disparity matters.

Therefore, as you point out, the situation where the final output values of each model differ by more than a factor of 10 is clearly not a problem with the models themselves.

AD-lite24 commented 2 months ago

If you eventually quantize to INT8/Float16 and use the model, accuracy is likely to degrade if the normalization process is performed at the beginning of the model. This is because the results vary greatly depending on the type of calibration performed during quantization, so including normalization in preprocessing does not necessarily result in a deterioration in accuracy.

Makes sense. Using float32 here but yes still holds true.

Thanks for the detail, I get why conversion is so hard especially for these larger models. So from what I can tell it is just not that easy to convert accurately for this particular model. And any arbitrary changes made by tflite cant be predicted (though normalization is not a big factor).

Therefore, as you point out, the situation where the final output values of each model differ by more than a factor of 10 is clearly not a problem with the models themselves.

So the issue is tflite? The values are close but not accurate enough, but that cant be explained since -cotof test does give an error of less than 1e-4. Maybe some depth scaling? I will figure out the scale factor if there is one. Thanks a bunch!

PINTO0309 commented 2 months ago

I have to attend a conference for the next three days, so my investigation and definitive answer will be a little delayed.

PINTO0309 / onnx2tf