PINTO0309 / onnx2tf

Self-Created Tools to convert ONNX files (NCHW) to TensorFlow/TFLite/Keras format (NHWC). The purpose of this tool is to solve the massive Transpose extrapolation problem in onnx-tensorflow (onnx-tf). I don't need a Star, but give me a pull request.
MIT License
662 stars 65 forks source link

Input and Output Name Order Swapping with -coion option #650

Closed ysohma closed 2 months ago

ysohma commented 2 months ago

Issue Type

Others

OS

Linux

onnx2tf version number

1.22.4

onnx version number

1.15.0

onnxruntime version number

1.17.1

onnxsim (onnx_simplifier) version number

0.4.33

tensorflow version number

2.16.1

Download URL for ONNX

N/A. provide by an attachment.

Parameter Replacement JSON

N/A.

Description

  1. Purpose: Research
  2. What & 3. How & 4. Why

When using the -coion option, the order of input and output names may sometimes swap.

Reproduction Steps

Prepare a simple model with two inputs (input0, input1) with different shapes for identification.

viz_onnx_model

Convert the model using the following command:

onnx2tf -i sample.onnx -o outputs -coion -oiqt

Observe the converted models for both the floating-point model (float32) and the fully integer-quantized model (full_integer_quant)

outputs/sample_float32.tflite

viz_full_integer_quant_model

outputs/sample_full_integer_quant.tflite

viz_float32_model

In the floating-point model, input0 and input1 are correctly reflected, but in the fully integer-quantized model, input0 and input1 are swapped.

This indicates that when using the -coion option with models that have multiple inputs and outputs, the fully integer-quantized model may have discrepancies in input and output names compared to the original ONNX model.

Analysis

To identify the cause of this issue, debug code was inserted into the onnx2tf source around here

image

It was found that the order of inputs/outputs in the flatbuffer does not match the order in the original ONNX model.

debug_log

In the conversion of the floating-point model, the TFLiteConverter is created directly from the concrete_function. However, for the fully integer-quantized model, the TFLiteConverter is created from a SavedModel with signaturedefs. This difference in method causes the order mismatch.

  1. Resources resources.zip
PINTO0309 commented 2 months ago

I was aware of that possible problem when I implemented this feature. Whether you use concrete_function or not, the result is the same.

Frankly, I can't think of a way to make the name order consistent with ONNX since the beginning of the creation of -coion. TensorFlow changes the order of inputs and outputs on its own. Therefore, there is virtually no information that ties the order of ONNX input/output names to the order of TFLite input/output names, and there is no way to reconcile that problem even if it were recognized that it occurs. Note that TensorFlow automatically generates garbage input/output names that I don't understand in the process of converting to TFLite, so any names/orders I specify in the process of generating Keras models are ignored.

From this TensorFlow behavior, I would guess that there is probably a bug in the FlatBuffer rewriting operation after INT8 quantization that is ignoring the input/output order. As evidence, the phenomenon is not reproduced for Float32, where TensorFlow does not need to rewrite FlatBuffer. This is talking about TensorFlow's internal processing, not -coion.

flat_subgraphs.inputs
array([0, 1], dtype=int32)

flat_subgraphs.outputs
array([2, 3], dtype=int32)

# It's mean
#     location: 0 -> [1, 64, 64, 3] --- TensorFlow ignores input orders and misplaces shapes
#     location: 1 -> [1, 128, 128, 3] --- TensorFlow ignores input orders and misplaces shapes
#     location: 2 -> [1, 128, 128, 3]
#     location: 3 -> [1, 64, 64, 3]
# 
# Regardless of whether it is `concreate_func` or `saved_model`,
# the model input/output order is shuffled when processing begins in TensorFlow.

onnx_input_names
['input0', 'input1']

onnx_output_names
['output0', 'output1']

To put the matter in proper perspective, the output order is not broken, but the input order is broken.

Since I have no control over the internal workings of TensorFlow myself, I am forced to include the following tutorial in the README. If you don't like the order of the names, just rewrite the names after generating the INT8 FlatBuffer.

https://github.com/PINTO0309/onnx2tf?tab=readme-ov-file#5-rewriting-of-tflite-inputoutput-op-names-and-signature_defs

PINTO0309 commented 2 months ago
ysohma commented 2 months ago

I understand that this issue is related to TensorFlow itself rather than onnx2tf. For my purposes, modifying the output INT8 model can be acceptable.

Thanks!

PINTO0309 commented 2 months ago

I will release v1.22.5 in a few more ten minutes.

github-actions[bot] commented 2 months ago

If there is no activity within the next two days, this issue will be closed automatically.