PINTO0309 / onnx2tf

Self-Created Tools to convert ONNX files (NCHW) to TensorFlow/TFLite/Keras format (NHWC). The purpose of this tool is to solve the massive Transpose extrapolation problem in onnx-tensorflow (onnx-tf). I don't need a Star, but give me a pull request.
MIT License
662 stars 65 forks source link

Wrong shape on a specific node into TFLITE 32 model generated #638

Closed PetiteFleurPF closed 3 months ago

PetiteFleurPF commented 3 months ago

Issue Type

Others

OS

Linux

onnx2tf version number

1.21.0

onnx version number

1.15.0

onnxruntime version number

1.17.1

onnxsim (onnx_simplifier) version number

0.4.33

tensorflow version number

2.16.1

Download URL for ONNX

model.onnx.zip

Parameter Replacement JSON

{}

Description

  1. Purpose - Product development: The impact of the resolution of this bug will increase the speed of the models.
  2. What: At node 282 into TFLITE 32 converted model (observable number during the model conversation), a bug appears preventing the prediction from being obtained during the inference process. This problem can be summed up as a shape problem: ‘Given shapes, [100, 4] and [99, 1], are not broadcastable. Node number 276 (ADD) failed to prepare.’

INFO: 283 / 301 INFO: onnx_op_type: Gather onnx_op_name: /Gather_8 INFO: input_name.1: /GatherND_1_output_0 shape: ['unk211', 4] dtype: float32 INFO: input_name.2: /TopK_output_1 shape: ['unk327'] dtype: int64 INFO: output_name.1: /Gather_8_output_0 shape: ['unk327', 4] dtype: float32 INFO: tf_op_type: gather_v2 INFO: input.1.params: name: tf.compat.v1.gather_nd_1/GatherNd:0 shape: (None, 4) dtype: <dtype: 'float32'> INFO: input.2.indices: name: tf.operators__.add_30/AddV2:0 shape: (None,) dtype: <dtype: 'int64'> INFO: input.3.axis: val: 0 INFO: output.1.output: name: tf.compat.v1.gather_3/GatherV2:0 shape: (None, 4) dtype: <dtype: 'float32'>

  1. How: I inspected the model to identify the precise step. I used the onnx2tf tool to identify the match and unmatch’ - I was then able to see that there was a slight difference in accuracy in one of the two inputs. However, this difference disappears when using the -cotoa 1e-1 option. The other input will have skipped values which are removed by a simple change in the code to accept dimensions equivalent to 1. I noticed that all the reshaping steps were unmatched before the option.
  2. Why: Because I really need to improve the inference time.
PetiteFleurPF commented 3 months ago

image current schema of the model FLOAT32 which generates the error. image (node 276 - FLOAT32 - Add - on the down of the second picture: It makes sense for the error to appear because we can see that we're trying to add two matrices of different dimensions. (1x4) and (1x1)) image (A large scale of the equivalent of ONNX version)

PetiteFleurPF commented 3 months ago

Log of ONNX conversion image

PetiteFleurPF commented 3 months ago

Log of validation image (the first line is the gather component which can be helpfull to localize the bug)

PINTO0309 commented 3 months ago

I will not investigate unless you share the ONNX files with me.

It's usually because of the garbage-like post-processing.

PetiteFleurPF commented 3 months ago

@PINTO0309 Hi PINTO0309 - I updated my description with a link to download my model :)

PINTO0309 commented 3 months ago

Post-processing is too redundant. There are no shapes that can be estimated by the tool after NonZero. Models in which the number of output elements varies with the content of the input data cannot be accurately transformed by onnx2tf. This is not a bug.

image

image

Cut out the model above the red line.

image

image

PetiteFleurPF commented 3 months ago

Theoretically I can't modify my model, so if I understand correctly the solution is not to modify your code but my model? Thank you in advance for your time.

PINTO0309 commented 3 months ago

if I understand correctly the solution is not to modify your code but my model?

Yes. Aside from the fact that it can and cannot be converted, the model is too redundant and inefficient. In particular, all models generated without modification of Detectron / Detectron2 and others will generate this kind of redundant post-processing.

The post-processing of your model can be replaced by the process shown below.

image

PetiteFleurPF commented 3 months ago

OK many thanks :)

PetiteFleurPF commented 3 months ago

Question - what tool do you recommend to do it, please? ONNX surgeon?

PINTO0309 commented 3 months ago

https://github.com/PINTO0309/PINTO_model_zoo/tree/main/307_YOLOv7 https://github.com/PINTO0309/PINTO_model_zoo/tree/main/322_YOLOv7_Head https://github.com/PINTO0309/PINTO_model_zoo/tree/main/326_YOLOPv2 https://github.com/PINTO0309/PINTO_model_zoo/tree/main/334_DAMO-YOLO https://github.com/PINTO0309/PINTO_model_zoo/tree/main/337_FreeYOLO https://github.com/PINTO0309/PINTO_model_zoo/tree/main/356_EdgeYOLO https://github.com/PINTO0309/PINTO_model_zoo/tree/main/447_YOLOX-Wholebody-with-Wheelchair

PetiteFleurPF commented 3 months ago

What I meant was a tool that lets me modify my model to change, replace or delete parts of the post-processing ^^ Sorry for the misunderstanding.