Closed AD-lite24 closed 2 months ago
As I commented in the previous issue, if you find it troublesome to correct the Transpose
, change the input resolution of the model to a fixed resolution. You should start by checking for precision errors yourself first by making sure to include the -cotof
option when converting. onnx2tf is imperfect when it comes to converting dynamic tensors.
There are hundreds of issues in this repository with the same question, so it's a good idea to search for the issue first.
It's mentally painful to be asked to answer the same thing over and over again. I might delete the Issues
tab soon.
Really sorry for this. Referring back to the previous issue, the model input resolution was fixed to [1, 3, 480, 640] so I believed the dynamic input size was no longer a problem. with -cotof I can see a pretty bad divergence with an abs error of 4340.93 in the final output. But from this step I thought the input tensors are no longer dynamic and I am reshaping my images according to the fixed tensor input.
Regardless I do not wish to take up anymore of your time, I will try to figure out why the the values are diverging
Using the -cotof
option will probably tell you which ops are missing the conversion.
Your model is a ViT model and has huge parameters, so the auto-correction by onnx2tf may be skipped.
The dummy inference function is necessary to automatically correct model conversion errors, but it consumes a large amount of RAM for models with large structures.
https://github.com/PINTO0309/onnx2tf/tree/main?tab=readme-ov-file#3-accuracy-check
onnx2tf -i metric3d-vit-small.onnx -cotof
Your model appears to consume 130GB of RAM for auto-correction. MatMul
has too many elements.
onnx2tf seems to make a mysterious and fatal error in the constant calculations in this part. There may be a problem with the calculation of optimizing two consecutive Sub
, i.e. y = (200 - x) - 200
.
It's probably a bug in the optimization process of arithmetic operations. Sorry for blaming you so much.
1e-4
.
Ah that is amazing. Brilliant as always. Results have improved significantly compared to the gibberish output as before. The values are still not accurate though from real world tests and the high frequency features were missed by the converted file.
Do you know how we can find out how the pre and post processing steps change upon conversion? Maybe the image needs normalisation as well or the output needs some sort of scaling? The onnx range of values were [4.7, 24.7] and the tflite range is [0.68631876, 11.6238165] and with metric depth such disparity matters. This does seem at odds with the calculated error of 1e-4 but I am not sure. I will run some more experiments to figure out the exact disparity.
Anyway this is certainly nothing you should worry about, you have helped me a lot. Maybe the model simply cannot be converted with very high precision. Again thanks a lot for taking out your time and fixing these issues!
I can't say anything for sure, just guessing how you plan to use the model, but here are some common patterns of loss of accuracy that can occur after conversion to TensorFlow:
AveragePool
has to be converted quite roughly because there is practically no OP in TensorFlow that performs the equivalent operation. Therefore, large errors may occur due to differences in padding processing and rounding at the edges of the image. There is a devastating specification difference between PyTorch/ONNX and TensorFlow in padding processing.
1e-4
, but you may want to check the errors for each OP that are checked by the -cotof
option yourself. The reason why it is better to check the accuracy check results of -cotof
again is that there may be problems with large errors being rounded or flattened in the OP processing of Softmax
or Convoloution
, Pooling
. However, since the structure of the ViT model is very large, the amount of accuracy check results output by the -cotof
option becomes enormous, and it is very difficult to visually check the error check results of all OPs.Div
with Mul
operations. This may seem like a simple operation, replacing division with multiplication of the reciprocal, but in reality, due to an issue with TensorFlow's internal rounding, errors often occur when the result of an OP that was originally expressed as a division is rewritten to be processed with Mul
. You can see this by looking closely at the results of converting your ViT model with -cotof
. We can see that errors occur in all Div
OPs. Sqrt
, which has a similar internal behavior, is prone to errors.TFLiteConverter does not guarantee the order of input and output OPs of models generated by Keras. Therefore, when a model with multiple input OPs or multiple output OPs is converted to tflite, the meaning of the input and output order in ONNX and the meaning of the input and output order in tflite may be randomly swapped. To deal with such a strange specification of TFLiteConverter, which may seem like a bug, onnx2tf implements an option called -coion
, which writes an inferable signature into the model using input and output names. By using interpreter.get_signature_runner()
, you can match input tensors and output tensors using the model's input and output names, so processing can be performed normally even if the input and output order is broken.
https://github.com/PINTO0309/onnx2tf?tab=readme-ov-file#14-inference-with-dynamic-tensors-in-tflite
e.g.
import numpy as np
import tensorflow as tf
from pprint import pprint
interpreter = tf.lite.Interpreter(model_path="saved_model/osnet_x0_25_msmt17_float32.tflite")
tf_lite_model = interpreter.get_signature_runner()
inputs = {
'images': np.ones([5,256,128,3], dtype=np.float32),
}
tf_lite_output = tf_lite_model(**inputs)
print(f"[TFLite] Model Predictions shape: {tf_lite_output['output'].shape}")
print(f"[TFLite] Model Predictions:")
pprint(tf_lite_output)
Your ONNX and TFLite with -coion
https://github.com/PINTO0309/onnx2tf/releases/download/1.25.9/metric3d-vit-small-with-coion.zip |
ONNX | TFLite |
---|---|---|
import numpy as np
import tensorflow as tf
from pprint import pprint
interpreter = tf.lite.Interpreter(model_path="metric3d-vit-small_float32.tflite")
tf_lite_model = interpreter.get_signature_runner()
inputs = {
'pixel_values': np.ones([1,480,640,3], dtype=np.float32),
}
tf_lite_output = tf_lite_model(**inputs)
print(f"[TFLite] Model Predictions shape.1: {tf_lite_output['predicted_depth'].shape}")
print(f"[TFLite] Model Predictions shape.2: {tf_lite_output['predicted_normal'].shape}")
print(f"[TFLite] Model Predictions shape.3: {tf_lite_output['normal_confidence'].shape}")
###### Input/output order is irrelevant
# print(f"[TFLite] Model Predictions shape.1: {tf_lite_output['predicted_normal'].shape}")
# print(f"[TFLite] Model Predictions shape.2: {tf_lite_output['normal_confidence'].shape}")
# print(f"[TFLite] Model Predictions shape.3: {tf_lite_output['predicted_depth'].shape}")
print(f"[TFLite] Model Predictions:")
pprint(tf_lite_output)
Note that onnx2tf fixes all elements to 1
when performing accuracy checks. Do not use real images or test data. This is because the type of input data is not known.
If the error check using the -cotof
option with all test data set at 1
converges to an error of around 1e-4
, the inference results of your test code should also be within an error of around 1e-4
for all elements.
The onnx range of values were [4.7, 24.7] and the tflite range is [0.68631876, 11.6238165] and with metric depth such disparity matters.
Therefore, as you point out, the situation where the final output values of each model differ by more than a factor of 10 is clearly not a problem with the models themselves.
If you eventually quantize to INT8/Float16 and use the model, accuracy is likely to degrade if the normalization process is performed at the beginning of the model. This is because the results vary greatly depending on the type of calibration performed during quantization, so including normalization in preprocessing does not necessarily result in a deterioration in accuracy.
Makes sense. Using float32 here but yes still holds true.
Thanks for the detail, I get why conversion is so hard especially for these larger models. So from what I can tell it is just not that easy to convert accurately for this particular model. And any arbitrary changes made by tflite cant be predicted (though normalization is not a big factor).
Therefore, as you point out, the situation where the final output values of each model differ by more than a factor of 10 is clearly not a problem with the models themselves.
So the issue is tflite? The values are close but not accurate enough, but that cant be explained since -cotof test does give an error of less than 1e-4. Maybe some depth scaling? I will figure out the scale factor if there is one. Thanks a bunch!
I have to attend a conference for the next three days, so my investigation and definitive answer will be a little delayed.
Issue Type
Others
OS
Linux
onnx2tf version number
1.25.7
onnx version number
1.16.2
onnxruntime version number
1.18.1
onnxsim (onnx_simplifier) version number
0.4.36
tensorflow version number
2.17.0
Download URL for ONNX
https://huggingface.co/onnx-community/metric3d-vit-small/blob/main/onnx/model.onnx
Parameter Replacement JSON
Description
To deploy a monodepth model on edge devices. R&D work and problem exploration. Massive impact since nothing except tflite seems to work with snapdragon SOCs
The model outputs are not correct at all. I did a lot of inspection and here are my findings
Here are the input details for the tflite conversion
and here are the output details
Upon inspection of the onnx file, the onnx version has 3 outputs
So in the tflite file, Identity, Identity_1, and identity_2 corresponds to either one of these. For predicted depth, it could be either identity or identity_2 and I tried both of them but neither give accurate results at all.
Identity gives values in the range of [-1000, -5000] which does not seem accurate for either confidence or depth values while Identity_2 gives values in the range of [10, 50] which seems more reasonable but still not accurate.
I am not sure if I was supposed to follow any pre or post processing steps different from the onnx format. Tflite often has different steps but I don't exactly know what they are.
this is an example for drawing inference from the onnx file which works absolutely fine. Is it the conversion process that broke it or is there something additional I need to do to fix the results?
Please also find the reference to an old issue which helped me with the conversion process.
I also created a colab notebook to make it easier to see the inferences from the onnx file. For the same image bus.jpg the range of values with onnx are [4.7, 24.7]