PINTO0309 / onnx2tf

Self-Created Tools to convert ONNX files (NCHW) to TensorFlow/TFLite/Keras format (NHWC). The purpose of this tool is to solve the massive Transpose extrapolation problem in onnx-tensorflow (onnx-tf). I don't need a Star, but give me a pull request.
MIT License
662 stars 65 forks source link

Incorrect arguments in tf.nn.convolution #642

Closed m-lafont closed 3 months ago

m-lafont commented 3 months ago

Issue Type

Others

OS

Linux

onnx2tf version number

1.20.0

onnx version number

1.15.0

onnxruntime version number

1.17.0

onnxsim (onnx_simplifier) version number

0.4.33

tensorflow version number

2.13.1

Download URL for ONNX

https://github.com/facebookresearch/detectron2

Parameter Replacement JSON

{
  "format_version": 1,
  "operations": [
    {
      "op_name": "Sub_85",
      "param_target": "outputs", 
      "param_name": "onnx::Div_309",
      "post_process_transpose_perm": [1,2,0] 
    }
  ]

}

Description

  1. Purpose : Research. I converted a Detectron2 from PyTorch to ONNX using the official documentation. I am trying now to convert this model to TensorFlow and then TensorFlow Lite. I modified a bit the model to change the input name and to solve the NonMaxSuppression issue.

  2. But when I am running : "onnx2tf -i model6_updated2nms_name.onnx". I get the following error :

Model conversion started ============================================================
INFO: input_op_name: input_1 shape: [3, 800, 1067] dtype: float32
WARNING: The optimization process for shape estimation is skipped because it contains OPs that cannot be inferred by the standard onnxruntime.
WARNING: object of type 'NoneType' has no len()

INFO: 2 / 731
INFO: onnx_op_type: Sub onnx_op_name: Sub_85
INFO:  input_name.1: input_1 shape: [3, 800, 1067] dtype: float32
INFO:  input_name.2: onnx::Sub_308 shape: [3, 1, 1] dtype: float32
INFO:  output_name.1: onnx____Div_309 shape: [3, 800, 1067] dtype: float32
INFO: tf_op_type: subtract
INFO:  input.1.x: name: input_1 shape: (3, 1067, 800) dtype: <dtype: 'float32'> 
INFO:  input.2.y: shape: (3, 1, 1) dtype: float32 
INFO:  output.1.output: name: tf.math.subtract/Sub:0 shape: (3, 1067, 800) dtype: <dtype: 'float32'> 

INFO: 3 / 731
INFO: onnx_op_type: Pad onnx_op_name: Pad_142
INFO:  input_name.1: onnx____Div_309 shape: [3, 800, 1067] dtype: float32
INFO:  input_name.2: onnx::Cast_375 shape: [6] dtype: int64
INFO:  input_name.3: onnx::Range_2834 shape: [] dtype: float32
INFO:  output_name.1: onnx____Unsqueeze_378 shape: [3, 800, 1088] dtype: float32
INFO: tf_op_type: Pad
INFO:  input.1.x: name: tf.math.subtract/Sub:0 shape: (3, 1067, 800) dtype: <dtype: 'float32'> 
INFO:  input.2.paddings: shape: (3, 2) dtype: <dtype: 'int32'> 
INFO:  input.3.constant_value: shape: () dtype: float32 
INFO:  input.4.mode: val: constant 
INFO:  input.5.tensor_rank: val: 3 
INFO:  output.1.output: name: tf.compat.v1.pad/Pad_142:0 shape: (3, 1088, 800) dtype: <dtype: 'float32'> 

INFO: 4 / 731
INFO: onnx_op_type: Unsqueeze onnx_op_name: Unsqueeze_143
INFO:  input_name.1: onnx____Unsqueeze_378 shape: [3, 800, 1088] dtype: float32
INFO:  output_name.1: onnx____Conv_379 shape: [1, 3, 800, 1088] dtype: float32
INFO: tf_op_type: reshape
INFO:  input.1.tensor: name: tf.compat.v1.pad/Pad_142:0 shape: (3, 1088, 800) dtype: <dtype: 'float32'> 
INFO:  input.2.shape: val: [1, 3, 1088, 800] 
INFO:  output.1.output: name: tf.reshape/Reshape:0 shape: (1, 3, 1088, 800) dtype: <dtype: 'float32'> 

INFO: 5 / 731
INFO: onnx_op_type: Conv onnx_op_name: Conv_144
INFO:  input_name.1: onnx____Conv_379 shape: [1, 3, 800, 1088] dtype: float32
INFO:  input_name.2: onnx::Conv_2667 shape: [64, 3, 7, 7] dtype: float32
INFO:  input_name.3: onnx::Conv_2668 shape: [64] dtype: float32
INFO:  output_name.1: onnx____Relu_2666 shape: [1, 64, 400, 544] dtype: float32
ERROR: The trace log is below.
Traceback (most recent call last):
  File "/home/maxime/.pyenv/versions/3.8.10/envs/envonnx2tf/lib/python3.8/site-packages/onnx2tf/utils/common_functions.py", line 310, in print_wrapper_func
    result = func(*args, **kwargs)
  File "/home/maxime/.pyenv/versions/3.8.10/envs/envonnx2tf/lib/python3.8/site-packages/onnx2tf/utils/common_functions.py", line 383, in inverted_operation_enable_disable_wrapper_func
    result = func(*args, **kwargs)
  File "/home/maxime/.pyenv/versions/3.8.10/envs/envonnx2tf/lib/python3.8/site-packages/onnx2tf/ops/Conv.py", line 453, in make_node
    conv_bias(
  File "/home/maxime/.pyenv/versions/3.8.10/envs/envonnx2tf/lib/python3.8/site-packages/onnx2tf/ops/Conv.py", line 302, in conv_bias
    tf.nn.convolution(
  File "/home/maxime/.pyenv/versions/3.8.10/envs/envonnx2tf/lib/python3.8/site-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/home/maxime/.pyenv/versions/3.8.10/envs/envonnx2tf/lib/python3.8/site-packages/keras/src/layers/core/tf_op_layer.py", line 119, in handle
    return TFOpLambda(op)(*args, **kwargs)
  File "/home/maxime/.pyenv/versions/3.8.10/envs/envonnx2tf/lib/python3.8/site-packages/keras/src/utils/traceback_utils.py", line 70, in error_handler
    raise e.with_traceback(filtered_tb) from None
ValueError: Exception encountered when calling layer "tf.nn.convolution" (type TFOpLambda).

Depth of input (800) is not a multiple of input depth of filter (3) for '{{node tf.nn.convolution/convolution}} = Conv2D[T=DT_FLOAT, data_format="NHWC", dilations=[1, 1, 1, 1], explicit_paddings=[], padding="VALID", strides=[1, 2, 2, 1], use_cudnn_on_gpu=true](Placeholder, tf.nn.convolution/convolution/filter)' with input shapes: [1,9,1094,800], [7,7,3,64].

Call arguments received by layer "tf.nn.convolution" (type TFOpLambda):
  • input=tf.Tensor(shape=(1, 9, 1094, 800), dtype=float32)
  • filters=tf.Tensor(shape=(7, 7, 3, 64), dtype=float32)
  • strides=['2', '2']
  • padding='VALID'
  • data_format=None
  • dilations=['1', '1']
  • name=None

ERROR: input_onnx_file_path: model6_updated2nms_name.onnx
ERROR: onnx_op_name: Conv_144
ERROR: Read this and deal with it. https://github.com/PINTO0309/onnx2tf#parameter-replacement
ERROR: Alternatively, if the input OP has a dynamic dimension, use the -b or -ois option to rewrite it to a static shape and try again.
ERROR: If the input OP of ONNX before conversion is NHWC or an irregular channel arrangement other than NCHW, use the -kt or -kat option.
ERROR: Also, for models that include NonMaxSuppression in the post-processing, try the -onwdt option.
  1. The model is running well on ONNX. I opened the model in Netron to investigate on the 'Conv_144'.

Screenshot from 2024-05-30 16-34-42

I tried to also include in .json file for parameter replacement but without success. Here is the json file :

param_replacement.json

The problem doesn't seem difficult but I can't seem to solve it :smiley: .

Thanks in advance !

PINTO0309 commented 3 months ago

Please provide a link to the onnx file.

In the first place, detectron's pre-processing is too intrusive. Models where batch sizes do not exist are garbage.

m-lafont commented 3 months ago

Thanks for your response ! I'll try to remove the first layers. Here is the onnx file :

https://we.tl/t-yBhRolLDjJ

PINTO0309 commented 3 months ago

We have known for years that Detectron2's model is honestly so broken that even onnxruntime cannot inference properly before the onnx2tf problem. Therefore, stop using Detectron2.

onnxsim model6_updated2nms_name.onnx model6_updated2nms_name.onnx
Your model contains "Tile" ops or/and "ConstantOfShape" ops. Folding these ops can make the simplified model much larger. If it is not expected, please specify "--no-large-tensor" (which will lose 
some optimization chances)
Simplifying...
Traceback (most recent call last):
  File "/home/xxxx/.local/bin/onnxsim", line 8, in <module>
    sys.exit(main())
  File "/home/xxxx/.local/lib/python3.10/site-packages/onnxsim/onnx_simplifier.py", line 481, in main
    model_opt, check_ok = simplify(
  File "/home/xxxx/.local/lib/python3.10/site-packages/onnxsim/onnx_simplifier.py", line 199, in simplify
    model_opt_bytes = C.simplify(
onnx.onnx_cpp2py_export.shape_inference.InferenceError: [ShapeInferenceError] (op_type:Transpose, node name: Transpose_631): [ShapeInferenceError] Inferred shape and existing shape differ in rank: (5) vs (0)

image

sit4onnx -if model6_updated2nms_name.onnx -oep cpu

INFO: file: model6_updated2nms_name.onnx
INFO: providers: ['CPUExecutionProvider']
INFO: input_name.1: input_1 shape: [3, 800, 1067] dtype: float32
Traceback (most recent call last):
  File "/home/xxxx/.local/bin/sit4onnx", line 8, in <module>
    sys.exit(main())
  File "/home/xxxx/.local/lib/python3.10/site-packages/sit4onnx/onnx_inference_test.py", line 506, in main
    final_results = inference(
  File "/home/xxxx/.local/lib/python3.10/site-packages/sit4onnx/onnx_inference_test.py", line 357, in inference
    results = onnx_session.run(
  File "/home/xxxx/.local/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 220, in run
    return self._sess.run(output_names, input_feed, run_options)
onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running ReduceMax node. Name:'ReduceMax_1858' Status Message: 

image

onnx2tf -i model6_updated2nms_name.onnx -kat input_1
INFO: 501 / 731
INFO: onnx_op_type: Cast onnx_op_name: Cast_1660
INFO:  input_name.1: onnx____Cast_2158 shape: None dtype: float32
INFO:  output_name.1: onnx____RoiAlign_2161 shape: None dtype: int64
INFO: tf_op_type: cast
INFO:  input.1.x: name: tf.where_10/SelectV2:0 shape: (None, None) dtype: <dtype: 'float32'> 
INFO:  input.2.dtype: name: int64 
INFO:  output.1.output: name: tf.cast_21/Cast:0 shape: (None, None) dtype: <dtype: 'int64'> 

INFO: 502 / 731
INFO: onnx_op_type: RoiAlign onnx_op_name: RoiAlign_1576
INFO:  input_name.1: input.11 shape: [1, 256, 200, 272] dtype: float32
INFO:  input_name.2: onnx____RoiAlign_2076 shape: None dtype: float32
INFO:  input_name.3: onnx____RoiAlign_2074 shape: None dtype: int64
INFO:  output_name.1: onnx____Reshape_2077 shape: ['unk__374', 256, 7, 7] dtype: float32
ERROR: The trace log is below.
Traceback (most recent call last):
  File "/home/xxxx/.local/lib/python3.10/site-packages/onnx2tf/utils/common_functions.py", line 312, in print_wrapper_func
    result = func(*args, **kwargs)
  File "/home/xxxx/.local/lib/python3.10/site-packages/onnx2tf/utils/common_functions.py", line 385, in inverted_operation_enable_disable_wrapper_func
    result = func(*args, **kwargs)
  File "/home/xxxx/.local/lib/python3.10/site-packages/onnx2tf/utils/common_functions.py", line 55, in get_replacement_parameter_wrapper_func
    func(*args, **kwargs)
  File "/home/xxxx/.local/lib/python3.10/site-packages/onnx2tf/ops/RoiAlign.py", line 189, in make_node
    croped_tensor = crop_and_resize(
  File "/home/xxxx/.local/lib/python3.10/site-packages/onnx2tf/ops/RoiAlign.py", line 178, in crop_and_resize
    ret = tf.image.crop_and_resize(
  File "/home/xxxx/.local/lib/python3.10/site-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/home/xxxx/.local/lib/python3.10/site-packages/tf_keras/src/layers/core/tf_op_layer.py", line 119, in handle
    return TFOpLambda(op)(*args, **kwargs)
  File "/home/xxxx/.local/lib/python3.10/site-packages/tf_keras/src/utils/traceback_utils.py", line 70, in error_handler
    raise e.with_traceback(filtered_tb) from None
ValueError: Exception encountered when calling layer "tf.image.crop_and_resize" (type TFOpLambda).

Shape must be rank 1 but is rank 2 for '{{node tf.image.crop_and_resize/CropAndResize}} = CropAndResize[T=DT_FLOAT, extrapolation_value=0, method="bilinear"](Placeholder, Placeholder_1, Placeholder_2, tf.image.crop_and_resize/CropAndResize/crop_size)' with input shapes: [1,200,272,256], [?,4], [?,?], [2].

Call arguments received by layer "tf.image.crop_and_resize" (type TFOpLambda):
  • image=tf.Tensor(shape=(1, 200, 272, 256), dtype=float32)
  • boxes=tf.Tensor(shape=(None, 4), dtype=float32)
  • box_indices=tf.Tensor(shape=(None, None), dtype=int32)
  • crop_size=('49', '49')
  • method=bilinear
  • extrapolation_value=0.0
  • name=None

ERROR: input_onnx_file_path: model6_updated2nms_name.onnx
ERROR: onnx_op_name: RoiAlign_1576
ERROR: Read this and deal with it. https://github.com/PINTO0309/onnx2tf#parameter-replacement
ERROR: Alternatively, if the input OP has a dynamic dimension, use the -b or -ois option to rewrite it to a static shape and try again.
ERROR: If the input OP of ONNX before conversion is NHWC or an irregular channel arrangement other than NCHW, use the -kt or -kat option.
ERROR: Also, for models that include NonMaxSuppression in the post-processing, try the -onwdt option.
/media/xxxx/Extreme SSD$ onnxsim model6_updated2nms_name.onnx model6_updated2nms_name.onnx 

https://github.com/google-ai-edge/ai-edge-torch

m-lafont commented 3 months ago

Thanks for your response ! I tried to use AI Edge Torch but the input shape of detectron2 is not suited for the conversion. I'll try to simplify the input shape to be [1,3,H,W] and then use the AI Edge Torch package and try also the conversion with onnx2tf. I'll let you know.

github-actions[bot] commented 3 months ago

If there is no activity within the next two days, this issue will be closed automatically.