Closed AD-lite24 closed 2 months ago
Since the beginning of the creation of onnx2tf, it has been working according to the specifications that have been determined from the beginning. The tool does not successfully convert models other than those with Float32 inputs.
If you simply want to generate a Float16 tflite file, it will work just fine if you first generate a Float32 ONNX file for you. If you have a strong reason for not being able to generate a Float32 ONNX file, please explain that reason in detail before beginning any correspondence on this issue.
I know that your R&D is very interesting, but I am concerned that you may have made a mistake in the arrangements for generating the tflite file in the first place.
Ah that makes sense. I will try it out on a float32 file then. While I would have preferred to have a float16 give that it is for embedded devices, it does give out float16 tflite files as you said.
Ok hi I am back. So I switched it to float32 and the progress was significantly more. However I ran into an issue with the ConvTranspose op.
2024-08-27 14:19:07.721322515 [E:onnxruntime:, sequential_executor.cc:516 ExecuteKernel] Non-zero status code returned while running Conv node. Name:'wa/depth_model/encoder/patch_embed/proj/Conv_output_0_nchwc' Status Message: Invalid input shape: {1,1}
This is actually an error that it gives before the entire conversion process starts. In fact it is an exception thrown by OnnxRuntime. But the onnx file runs fine and gives the desired inference, and on netron too the input and output shapes are fine. So I am simply not able to understand where the input shape {1, 1} is coming from.
The input X to the conv node is meant to be of shape (N, 384, H/14, W/14) and it seems to be that, but it seems to be getting an input of shape {1, 1 } which is super odd. I again went through the source code as well as the parameter replacement docs you gave but I am unable to identify the source of this {1, 1} shaped tensor.
Here is the snippet of the log where the complete errors occurs while the verbose log file is attached
INFO: 763 / 1763
INFO: onnx_op_type: ConvTranspose onnx_op_name: wa/depth_model/decoder/token2feature/read_1/sample/ConvTranspose
INFO: input_name.1: wa/depth_model/decoder/token2feature/read_1/Transpose_output_0 shape: ['batch_size', 384, 'floor(height/14)', 'floor(width/14)'] dtype: float32
INFO: input_name.2: meta_arch.depth_model.decoder.token2feature.read_1.sample.weight shape: [384, 192, 2, 2] dtype: float32
INFO: input_name.3: meta_arch.depth_model.decoder.token2feature.read_1.sample.bias shape: [192] dtype: float32
INFO: output_name.1: wa/depth_model/decoder/token2feature/read_1/sample/ConvTranspose_output_0 shape: ['batch_size', 192, 'ConvTranspose_761_o0__d2', 'ConvTranspose_761_o0__d3'] dtype: float32
2024-08-27 14:19:07.721322515 [E:onnxruntime:, sequential_executor.cc:516 ExecuteKernel] Non-zero status code returned while running Conv node. Name:'wa/depth_model/encoder/patch_embed/proj/Conv_output_0_nchwc' Status Message: Invalid input shape: {1,1}
ERROR: The trace log is below.
Traceback (most recent call last):
File "/home/robocon2/miniconda3/envs/onnx-tflite/lib/python3.10/site-packages/onnx2tf/utils/common_functions.py", line 312, in print_wrapper_func
result = func(*args, **kwargs)
File "/home/robocon2/miniconda3/envs/onnx-tflite/lib/python3.10/site-packages/onnx2tf/utils/common_functions.py", line 385, in inverted_operation_enable_disable_wrapper_func
result = func(*args, **kwargs)
File "/home/robocon2/miniconda3/envs/onnx-tflite/lib/python3.10/site-packages/onnx2tf/utils/common_functions.py", line 55, in get_replacement_parameter_wrapper_func
func(*args, **kwargs)
File "/home/robocon2/miniconda3/envs/onnx-tflite/lib/python3.10/site-packages/onnx2tf/ops/ConvTranspose.py", line 178, in make_node
dummy_onnx_inference(
File "/home/robocon2/miniconda3/envs/onnx-tflite/lib/python3.10/site-packages/onnx2tf/utils/common_functions.py", line 3864, in dummy_onnx_inference
outputs = onnx_session.run(None, input_datas)
File "/home/robocon2/miniconda3/envs/onnx-tflite/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 220, in run
return self._sess.run(output_names, input_feed, run_options)
onnxruntime.capi.onnxruntime_pybind11_state.InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Non-zero status code returned while running Conv node. Name:'wa/depth_model/encoder/patch_embed/proj/Conv_output_0_nchwc' Status Message: Invalid input shape: {1,1}
Please share the Float32 model. As you surmise, it's an ONNX issue, but without the model files I have no idea.
Looking only at the partial image you shared, it is only natural that such an error would occur if the input resolution of the model is dynamic.
Right my bad. Please find the file on huggingface https://huggingface.co/onnx-community/metric3d-vit-small/blob/main/onnx/model.onnx
The input resolution is indeed dynamic, there is no height or width it needs to be resized to in preprocessing, but I am not sure how this comes into play in the conversion process. I have a lot to learn from your codebase which I am doing as we speak
Tests
Customized ONNX
format="ONNX v10"
-> format="ONNX v9"
1,3,480,640
Results
TFLite Float16
Can you tell me how you customized the onnx file? The onnx modifier does not allow changes to format version and static shape does does not get reflected downstream
Can you tell me how you customized the onnx file?
Please wait a moment. I am in the process of examining what is wrong with the design of this model.
I found the process very redundant by design.
NCHW
-> Transpose
-> NHWC
-> ReduceMean axis=-1
-> Transpose
-> NCHW
: At all points in the block of this pattern, onnx2tf loses the correct channel position. In this model, the ReduceMean
should have been processed with axis=1
, but we went to the trouble of Transpose
and transposed the channel twice.
If changing the input resolution of the model to a fixed resolution and optimizing the overall model structure is not a problem, the easiest way to generate tflite is described below. I don't know how you generated your onnx file, but I don't recommend ir_version=10
.
onnx2json \
--input_onnx_file_path metric3d-vit-small.onnx \
--output_json_path metric3d-vit-small.json \
--json_indent 2
nano metric3d-vit-small.json
{
"irVersion": "10",
"producerName": "pytorch",
"producerVersion": "2.0.1",
"domain": "",
"graph": {
"node": [
{
"irVersion": "9",
"producerName": "pytorch",
"producerVersion": "2.0.1",
"domain": "",
"graph": {
"node": [
json2onnx \
--input_json_path metric3d-vit-small.json \
--output_onnx_file_path metric3d-vit-small.onnx
onnxsim metric3d-vit-small.onnx metric3d-vit-small.onnx --overwrite-input-shape "pixel_values:1,3,480,640"
onnx2tf -i metric3d-vit-small.onnx
Wow this is brilliant! Thanks so much for the help. I still have to test it with the tflite api and validate the model results but still you have amazing skills
Just curious though, why does it not support variable sizes? In every node the input size are written relative to the original height and width so it should be simple enough to plug those values in right? Of course it might very well be very complicated since I am not aware of the entire scope of the project, but if you could tell me why I could learn something more from you again
You are diligent. It is now 2:00 AM in Japan, so I will explain tomorrow morning. Good night.
I will explain step by step.
The reasons for the very high conversion difficulty in converting the model are simply as follows.
[a, 3, b, c]
is image data, voice data, or some other sensor data from its appearance.[N, H, W, 3]
when the three undefined ranks [N, 3, H, W]
are not known to actually be H
and W
, as in this ViT model.[a, 3, b, c]
is input, it uses dummy np.ones([1, 3, 1, 1], dtype=np.float32)
sample data only once to infer the correct channel order as much as possible internally to compute provisional rank sizes on each OP in the middle of the model, and then perform a correction operation to the correct Transpose
order using those provisional rank sizes.b
and c
effectively mean H
and W
. However, since there is no such hint information in the model definition information, onnx2tf has no choice but to perform the inference only once using the dummy sizes b=1
and c=1
.ir_version=10
has a problem that interferes with the optimization operation of ONNX, so the operation of optimizing the model with onnx2tf fails, resulting in a model structure that is more redundant than necessary.b
and c
, i.e., H
and W
, are undefined, we had no choice but to perform shape estimation on impossible dummy data assuming b=1
and c=1
, resulting in an impossibly small size of the input tensor to ConvTranspose
. Transpose
, the input tensor just before ConvTranspose
was [N, H, C, W]
because the correct order of the channels was not known. This mistake can be made in various ways depending on the complexity of the model.A fairly complex conversion operation is implemented to accommodate automatic conversion of any model that assumes input data other than images. You are the only one who knows that the input data for this model is an image.
Thanks for the detailed explanation. I get what you are saying. However in each node'
{
"name": "/depth_model/encoder/blocks.0/blocks.0.0/norm1/Sub_output_0",
"type": {
"tensorType": {
"elemType": 1,
"shape": {
"dim": [
{
"dimParam": "batch_size"
},
{
"dimParam": "floor(height/14)*floor(width/14) + 5"
},
{
"dimValue": "384"
}
]
}
}
}
},
^ This is from the json file
The dim param is always in an arithmetic of the original variables height and width. The way I understand from your comment, the onnx2tf builds a numpy array from it. Now since the values of height and width are not know we perhaps cannot create the appropriate array but then how does onnx manage to have its model graph with shapes defined by variables that will only be defined at runtime?
Referring to point 6, if onnx2tf is provided information that the input variables height and width are dynamic, and every node downstream is always relative to these variables fixed at runtime, could there not be a work around to define the nodes in tflite with the a similar variable in the entry node?
Since b and c, i.e., H and W, are undefined, we had no choice but to perform shape estimation on impossible dummy data assuming b=1 and c=1, resulting in an impossibly small size of the input tensor to ConvTranspose.
Right so in the conversion steps H and W are recognized as None and onnx2tf sets them as 1, which then causes an issue with the ConvTranspose. But if values are not provided, we could perhaps a create a map between the variable names as strings and an integer value, and expect the map to be defined at runtime? And till then use the strings as placeholders and create other structures and define the Ops to be done on the string at runtime? It might not work with the how tflite works but onnx supports (though likely not in the way I have described it) it so I have a feeling it could be done with tflite as well.
Look closely at the names of the attributes. The way dimParam
and dimValue
are handled inside ONNX is completely different.
dimParam
is treated as a simple string. That is, your JSON is just set to a string that looks like an equation. The string "floor(height/14)*floor(width/14) + 5"
makes little sense. The system dynamically calculates the size during inference in onnxruntime. Referring to the column named "floor(height/14)*floor(width/14) + 5"
from onnx2tf will get None
. In other words, no value. Your model is special in that it has a formula set in dimParam
, but the models generated by people other than you do not have a formula but just a meaningless string like abcdefgh
. In addition, this string is usually given an unintelligible name, such as a sequential number generated by PyTorch and ONNX with certain internal rules.dimValue
is treated as an integer value of type INT64. That is, it is a constant. Right so in the conversion steps H and W are recognized as None and onnx2tf sets them as 1, which then causes an issue with the ConvTranspose. But if values are not provided, we could perhaps a create a map between the variable names as strings and an integer value, and expect the map to be defined at runtime? And till then use the strings as placeholders and create other structures and define the Ops to be done on the string at runtime? It might not work with the how tflite works but onnx supports (though likely not in the way I have described it) it so I have a feeling it could be done with tflite as well.
No. That's not possible. (I'm thinking I probably can't do it.) It is already done precisely, but the TFLite runtime is not flexible. To add a little more, the APIs for Keras, TensorFlow, and TensorFlowLite are not flexible. I did not say that ONNX with dynamic tensors cannot be converted to TFLite. In fact, I am able to convert models with some dynamic inputs other than yours.
The tutorial below explains how to generate a TFLite with a dynamic tensor and how to perform inference.
https://github.com/PINTO0309/onnx2tf?tab=readme-ov-file#14-inference-with-dynamic-tensors-in-tflite
However, TensorFlow is very annoying because it cannot define a flexible model structure like ONNX and has already been troubling me for more than two years.
TensorFlow, TFLite runtime and Keras completely ignore the name given to the OP. LOL.
For your reference, I'll post a JSON image of TFLite's model structure. It is not possible to name each element. All undefined elements can only be declared as -1
.
Just to be clear, we have a function to customize the behavior of onnx2tf using the file param_replacement.json
, so I know from the start that your model will support dynamic tensor transformation by disabling all Transpose
. However, it is very annoying to have to specify custom behavior for every Transpose
that should be disabled.
https://github.com/PINTO0309/onnx2tf?tab=readme-ov-file#parameter-replacement
Rather than spending the effort on such things, the following very useful tools can be used to generate tflite models in a more flexible, straightforward, and efficient manner.
https://github.com/AlexanderLutsenko/nobuco
Alternatively, some tools, such as the one below, preserve channel order by extrapolating a large number of Transpose
s immediately before and after most of the OPs. Although maintenance has already been abandoned, any model pattern could be converted. However, the inference speed of the generated tflite is 25% slower than that generated by onnx2tf.
https://github.com/onnx/onnx-tensorflow
Since onnx2tf is not a tool designed simply to convert models accurately, but rather to convert them while greatly optimizing the redundant model structure of the conversion source, the behavior of thoroughly removing or fusing unnecessary OPs is implemented. Therefore, defining a simple OP name mapping or mapping by element name is itself quite difficult.
Statically and accurately tying all the elements together during model transformation and dynamically estimating the shape during inference are completely different behaviors.
The system dynamically calculates the size during inference in onnxruntime. Referring to the column named "floor(height/14)*floor(width/14) + 5" from onnx2tf will get None. In other words, no value. Your model is special in that it has a formula set in dimParam, but the models generated by people other than you do not have a formula but just a meaningless string like abcdefgh.
Oh so any fix could not be generalized to all models. But in my case (ie. this particular) if I write an adapter which could parse the string into an arithmetic operation and create some data structure to store it should theoretically be possible?
Just to be clear, we have a function to customize the behavior of onnx2tf using the file param_replacement.json, so I know from the start that your model will support dynamic tensor transformation by disabling all Transpose. However, it is very annoying to have to specify custom behavior for every Transpose that should be disabled.
Ah so the transpose ops mess with the rank ordering of the dimensions. Turning them off (if they are redundant) or simplifying them (using onnxsim) resolves the issue. Got it
Rather than spending the effort on such things, the following very useful tools can be used to generate tflite models in a more flexible, straightforward, and efficient manner. https://github.com/AlexanderLutsenko/nobuco
Was not aware of this, will try this out soon! The onnx-tensorflow module is not maintained though and is not compatible with anything really or the compatibility is poorly defined. I spent a few hours dealing with that before I landed upon your project (thank God for that). It is odd they stopped maintaining an official converter but maybe the update to tensorflow 2.x made them give up I suppose lol.
Oh so any fix could not be generalized to all models. But in my case (ie. this particular) if I write an adapter which could parse the string into an arithmetic operation and create some data structure to store it should theoretically be possible?
I think you're right, if you work super hard you can do it. :smile_cat:
Issue Type
Documentation Feature Request
OS
Linux
onnx2tf version number
1.25.7
onnx version number
1.16.2
onnxruntime version number
1.18.1
onnxsim (onnx_simplifier) version number
0.4.36
tensorflow version number
2.17.0
Download URL for ONNX
https://huggingface.co/onnx-community/metric3d-vit-small/resolve/main/onnx/model_fp16.onnx
Parameter Replacement JSON
Description
Trying to deploy a monocular depth estimation model to an autonomous drone flight controller using Snapdragon SOCs. It is for both research, an academic curiosity to try to run a large model on the edge as well as for product development. I have spent so long trying to make these models run on snapdragon SOCs and TFlite is the only framework which can interact well with the snapdragon neural engines, so it is crucial I get this converted. Solving this problem will finally culminate into hours of studies and research work into deploying large models on the edge and I can finally move on to the next stage of development.
When I run
onnx2tf -i model_fp16.onnx
it runs up till 69/1875 layers upon which it gives an exceptionTypeError: Exception encountered when calling layer "tf.math.add_3" (type TFOpLambda).
Input 'y' of 'AddV2' Op has type float32 that does not match type float16 of argument 'x'.
Call arguments received by layer "tf.math.add_3" (type TFOpLambda): • x=tf.Tensor(shape=(), dtype=float16) • y=tf.Tensor(shape=(), dtype=float32) • name='wa/depth_model/encoder/Add'