How to increase batch size when using post_process_gen_tools

ozayr commented 3 weeks ago

Issue Type

Support

OS

Mac OS

OS architecture

aarch64

Programming Language

Other

Framework

ONNX

Model name and Weights/Checkpoints URL

https://github.com/PINTO0309/PINTO_model_zoo/tree/main/449_YOLOX-WholeBody12

Description

Hi, thank you for the work

When using post process gen tools how do I increase the batch size parameter to create a model for batch processing

I place a model from the downloaded models in there folder where the name of the model does not have post in it
I then edit the BATCH parameter to eg 30
I run convert_script.sh

I see the model generated but the output shape Is not changed, the log shows correctly

 Layer (type)                Output Shape                 Param #   Connected to                  
==================================================================================================
 input_1 (InputLayer)        [(30, 12, 6300)]             0         []                            

 input_2 (InputLayer)        [(None, 3)]                  0         []                            

 tf.compat.v1.gather_nd (TF  (None,)                      0         ['input_1[0][0]',             
 OpLambda)                                                           'input_2[0][0]']             

 tf.__operators__.getitem (  (None, 1)                    0         ['tf.compat.v1.gather_nd[0][0]
 SlicingOpLambda)                                                   ']                            

==================================================================================================

but when running the model and checking input shapes it still shows [1,3,480,640] and not [30,3,480,640]

Relevant Log Output

INFO: MODEL_INDX=1: 02_boxes_scores_6300.onnx, prefix="02"
INFO: MODEL_INDX=2: 03_cxcywh_y1x1y2x2_6300.onnx, prefix="03"
INFO: Finish!
INFO: MODEL_INDX=1: 01_grid_6300.onnx, prefix="None"
INFO: MODEL_INDX=2: 04_boxes_x1y1x2y2_y1x1y2x2_scores_6300.onnx, prefix="None"
INFO: Finish!
INFO: The model is checked!
INFO: The model is checked!
INFO: The model is checked!
INFO: The model is checked!
INFO: MODEL_INDX=1: 05_Constant_max_output_boxes_per_class.onnx, prefix="None"
INFO: MODEL_INDX=2: 08_NonMaxSuppression11.onnx, prefix="None"
INFO: Finish!
INFO: MODEL_INDX=1: 06_Constant_iou_threshold.onnx, prefix="None"
INFO: MODEL_INDX=2: 08_NonMaxSuppression11.onnx, prefix="None"
INFO: Finish!
INFO: MODEL_INDX=1: 07_Constant_score_threshold.onnx, prefix="None"
INFO: MODEL_INDX=2: 08_NonMaxSuppression11.onnx, prefix="None"
INFO: Finish!
INFO: MODEL_INDX=1: 04_boxes_x1y1x2y2_y1x1y2x2_scores_6300.onnx, prefix="None"
INFO: MODEL_INDX=2: 08_NonMaxSuppression11.onnx, prefix="None"
INFO: Finish!
INFO: The model is checked!
INFO: The model is checked!
INFO: MODEL_INDX=1: 11_Constant_workaround_mul.onnx, prefix="None"
INFO: MODEL_INDX=2: 10_Mul11_workaround.onnx, prefix="None"
INFO: Finish!
INFO: MODEL_INDX=1: 09_nms_yolox_6300.onnx, prefix="None"
INFO: MODEL_INDX=2: 11_Constant_workaround_mul.onnx, prefix="None"
INFO: Finish!
Model: "model"
__________________________________________________________________________________________________
 Layer (type)                Output Shape                 Param #   Connected to                  
==================================================================================================
 input_1 (InputLayer)        [(30, 12, 6300)]             0         []                            

 input_2 (InputLayer)        [(None, 3)]                  0         []                            

 tf.compat.v1.gather_nd (TF  (None,)                      0         ['input_1[0][0]',             
 OpLambda)                                                           'input_2[0][0]']             

 tf.__operators__.getitem (  (None, 1)                    0         ['tf.compat.v1.gather_nd[0][0]
 SlicingOpLambda)                                                   ']                            

==================================================================================================
Total params: 0 (0.00 Byte)
Trainable params: 0 (0.00 Byte)
Non-trainable params: 0 (0.00 Byte)
__________________________________________________________________________________________________
2024-06-23 14:04:12.636667: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:378] Ignored output_format.
2024-06-23 14:04:12.636689: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:381] Ignored drop_control_dependency.
2024-06-23 14:04:12.637382: I tensorflow/cc/saved_model/reader.cc:83] Reading SavedModel from: /var/folders/1c/jwsk4s1j0nlfqzycgqdw_3fr0000gn/T/tmph7zot1sx
2024-06-23 14:04:12.637658: I tensorflow/cc/saved_model/reader.cc:51] Reading meta graph with tags { serve }
2024-06-23 14:04:12.637668: I tensorflow/cc/saved_model/reader.cc:146] Reading SavedModel debug info (if present) from: /var/folders/1c/jwsk4s1j0nlfqzycgqdw_3fr0000gn/T/tmph7zot1sx
2024-06-23 14:04:12.638570: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:382] MLIR V1 optimization pass is not enabled
2024-06-23 14:04:12.638918: I tensorflow/cc/saved_model/loader.cc:233] Restoring SavedModel bundle.
2024-06-23 14:04:12.667281: I tensorflow/cc/saved_model/loader.cc:217] Running initialization op on SavedModel bundle at path: /var/folders/1c/jwsk4s1j0nlfqzycgqdw_3fr0000gn/T/tmph7zot1sx
2024-06-23 14:04:12.671103: I tensorflow/cc/saved_model/loader.cc:316] SavedModel load for tags { serve }; Status: success: OK. Took 33722 microseconds.
2024-06-23 14:04:12.759996: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:269] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
2024-06-23 14:04:12.943485: I tensorflow/compiler/mlir/lite/flatbuffer_export.cc:2245] Estimated count of arithmetic ops: 0  ops, equivalently 0  MACs
/Users/ghost/mambaforge/envs/pinto/lib/python3.10/runpy.py:126: RuntimeWarning: 'tf2onnx.convert' found in sys.modules after import of package 'tf2onnx', but prior to execution of 'tf2onnx.convert'; this may result in unpredictable behaviour
  warn(RuntimeWarning(msg))
2024-06-23 14:04:16,200 - WARNING - ***IMPORTANT*** Installed protobuf is not cpp accelerated. Conversion will be extremely slow. See https://github.com/onnx/tensorflow-onnx/issues/1557
2024-06-23 14:04:16,201 - INFO - Using tensorflow=2.14.0, onnx=1.16.1, tf2onnx=1.16.1/15c810
2024-06-23 14:04:16,201 - INFO - Using opset <onnx, 11>
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
2024-06-23 14:04:16,223 - INFO - Optimizing ONNX model
2024-06-23 14:04:16,244 - INFO - After optimization: Cast -1 (1->0), Const -2 (6->4), Identity -1 (1->0)
2024-06-23 14:04:16,246 - INFO - 
2024-06-23 14:04:16,246 - INFO - Successfully converted TensorFlow model saved_model_postprocess/nms_score_gather_nd.tflite to ONNX
2024-06-23 14:04:16,246 - INFO - Model inputs: ['serving_default_input_1:0', 'serving_default_input_2:0']
2024-06-23 14:04:16,246 - INFO - Model outputs: ['PartitionedCall:0']
2024-06-23 14:04:16,246 - INFO - ONNX model is saved at 12_nms_score_gather_nd.onnx
INFO: Finish!
INFO: Finish!
INFO: Finish!
INFO: Finish!
Simplifying...
Finish! Here is the difference:
┏━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┓
┃            ┃ Original Model ┃ Simplified Model ┃
┡━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━┩
│ Constant   │ 4              │ 4                │
│ GatherND   │ 1              │ 1                │
│ Reshape    │ 1              │ 1                │
│ Slice      │ 1              │ 1                │
│ Model Size │ 908.0B         │ 907.0B           │
└────────────┴────────────────┴──────────────────┘
Simplifying...
Finish! Here is the difference:
┏━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┓
┃            ┃ Original Model ┃ Simplified Model ┃
┡━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━┩
│ Constant   │ 4              │ 4                │
│ GatherND   │ 1              │ 1                │
│ Reshape    │ 1              │ 1                │
│ Slice      │ 1              │ 1                │
│ Model Size │ 907.0B         │ 907.0B           │
└────────────┴────────────────┴──────────────────┘
INFO: MODEL_INDX=1: 09_nms_yolox_6300.onnx, prefix="None"
INFO: MODEL_INDX=2: 12_nms_score_gather_nd.onnx, prefix="None"
INFO: Finish!
Simplifying...
Finish! Here is the difference:
┏━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┓
┃                   ┃ Original Model ┃ Simplified Model ┃
┡━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━┩
│ Add               │ 3              │ 3                │
│ Concat            │ 2              │ 2                │
│ Constant          │ 19             │ 19               │
│ Div               │ 2              │ 2                │
│ Exp               │ 1              │ 1                │
│ GatherND          │ 1              │ 1                │
│ Mul               │ 3              │ 3                │
│ NonMaxSuppression │ 1              │ 1                │
│ Reshape           │ 1              │ 1                │
│ ScatterND         │ 2              │ 2                │
│ Slice             │ 10             │ 10               │
│ Sub               │ 2              │ 2                │
│ Transpose         │ 1              │ 1                │
│ Model Size        │ 669.6KiB       │ 669.6KiB         │
└───────────────────┴────────────────┴──────────────────┘
Simplifying...
Finish! Here is the difference:
┏━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┓
┃                   ┃ Original Model ┃ Simplified Model ┃
┡━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━┩
│ Add               │ 3              │ 3                │
│ Concat            │ 2              │ 2                │
│ Constant          │ 19             │ 19               │
│ Div               │ 2              │ 2                │
│ Exp               │ 1              │ 1                │
│ GatherND          │ 1              │ 1                │
│ Mul               │ 3              │ 3                │
│ NonMaxSuppression │ 1              │ 1                │
│ Reshape           │ 1              │ 1                │
│ ScatterND         │ 2              │ 2                │
│ Slice             │ 10             │ 10               │
│ Sub               │ 2              │ 2                │
│ Transpose         │ 1              │ 1                │
│ Model Size        │ 669.6KiB       │ 669.6KiB         │
└───────────────────┴────────────────┴──────────────────┘
Model: "model"
__________________________________________________________________________________________________
 Layer (type)                Output Shape                 Param #   Connected to                  
==================================================================================================
 input_1 (InputLayer)        [(1, 5040, 4)]               0         []                            

 input_2 (InputLayer)        [(None, 2)]                  0         []                            

 tf.compat.v1.gather_nd (TF  (None, 4)                    0         ['input_1[0][0]',             
 OpLambda)                                                           'input_2[0][0]']             

 tf.cast (TFOpLambda)        (None, 4)                    0         ['tf.compat.v1.gather_nd[0][0]
                                                                    ']                            

==================================================================================================
Total params: 0 (0.00 Byte)
Trainable params: 0 (0.00 Byte)
Non-trainable params: 0 (0.00 Byte)
__________________________________________________________________________________________________
2024-06-23 14:04:26.038981: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:378] Ignored output_format.
2024-06-23 14:04:26.039002: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:381] Ignored drop_control_dependency.
2024-06-23 14:04:26.039617: I tensorflow/cc/saved_model/reader.cc:83] Reading SavedModel from: /var/folders/1c/jwsk4s1j0nlfqzycgqdw_3fr0000gn/T/tmp10bw6nau
2024-06-23 14:04:26.039865: I tensorflow/cc/saved_model/reader.cc:51] Reading meta graph with tags { serve }
2024-06-23 14:04:26.039873: I tensorflow/cc/saved_model/reader.cc:146] Reading SavedModel debug info (if present) from: /var/folders/1c/jwsk4s1j0nlfqzycgqdw_3fr0000gn/T/tmp10bw6nau
2024-06-23 14:04:26.040368: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:382] MLIR V1 optimization pass is not enabled
2024-06-23 14:04:26.040563: I tensorflow/cc/saved_model/loader.cc:233] Restoring SavedModel bundle.
2024-06-23 14:04:26.045629: I tensorflow/cc/saved_model/loader.cc:217] Running initialization op on SavedModel bundle at path: /var/folders/1c/jwsk4s1j0nlfqzycgqdw_3fr0000gn/T/tmp10bw6nau
2024-06-23 14:04:26.049429: I tensorflow/cc/saved_model/loader.cc:316] SavedModel load for tags { serve }; Status: success: OK. Took 9812 microseconds.
2024-06-23 14:04:26.054724: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:269] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
2024-06-23 14:04:26.064253: I tensorflow/compiler/mlir/lite/flatbuffer_export.cc:2245] Estimated count of arithmetic ops: 0  ops, equivalently 0  MACs
/Users/ghost/mambaforge/envs/pinto/lib/python3.10/runpy.py:126: RuntimeWarning: 'tf2onnx.convert' found in sys.modules after import of package 'tf2onnx', but prior to execution of 'tf2onnx.convert'; this may result in unpredictable behaviour
  warn(RuntimeWarning(msg))
2024-06-23 14:04:29,341 - WARNING - ***IMPORTANT*** Installed protobuf is not cpp accelerated. Conversion will be extremely slow. See https://github.com/onnx/tensorflow-onnx/issues/1557
2024-06-23 14:04:29,342 - INFO - Using tensorflow=2.14.0, onnx=1.16.1, tf2onnx=1.16.1/15c810
2024-06-23 14:04:29,342 - INFO - Using opset <onnx, 11>
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
2024-06-23 14:04:29,349 - INFO - Optimizing ONNX model
2024-06-23 14:04:29,356 - INFO - After optimization: Identity -1 (1->0)
2024-06-23 14:04:29,358 - INFO - 
2024-06-23 14:04:29,358 - INFO - Successfully converted TensorFlow model saved_model_postprocess/nms_box_gather_nd.tflite to ONNX
2024-06-23 14:04:29,358 - INFO - Model inputs: ['serving_default_input_1:0', 'serving_default_input_2:0']
2024-06-23 14:04:29,358 - INFO - Model outputs: ['PartitionedCall:0']
2024-06-23 14:04:29,358 - INFO - ONNX model is saved at 14_nms_box_gather_nd.onnx
INFO: Finish!
INFO: Finish!
INFO: Finish!
INFO: Finish!
Simplifying...
Finish! Here is the difference:
┏━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┓
┃            ┃ Original Model ┃ Simplified Model ┃
┡━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━┩
│ Constant   │ 0              │ 0                │
│ GatherND   │ 1              │ 1                │
│ Model Size │ 270.0B         │ 270.0B           │
└────────────┴────────────────┴──────────────────┘
Simplifying...
Finish! Here is the difference:
┏━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┓
┃            ┃ Original Model ┃ Simplified Model ┃
┡━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━┩
│ Constant   │ 0              │ 0                │
│ GatherND   │ 1              │ 1                │
│ Model Size │ 270.0B         │ 270.0B           │
└────────────┴────────────────┴──────────────────┘
INFO: MODEL_INDX=1: 09_nms_yolox_6300_nd.onnx, prefix="main01"
INFO: MODEL_INDX=2: 13_nms_final_batch_nums_final_class_nums_final_box_nums.onnx, prefix="sub01"
INFO: Finish!
INFO: MODEL_INDX=1: 15_nms_yolox_6300_split.onnx, prefix="None"
INFO: MODEL_INDX=2: 14_nms_box_gather_nd.onnx, prefix="None"
INFO: Finish!
Simplifying...
Finish! Here is the difference:
┏━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┓
┃                   ┃ Original Model ┃ Simplified Model ┃
┡━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━┩
│ Add               │ 3              │ 3                │
│ Cast              │ 2              │ 2                │
│ Concat            │ 2              │ 2                │
│ Constant          │ 20             │ 20               │
│ Div               │ 2              │ 2                │
│ Exp               │ 1              │ 1                │
│ Gather            │ 1              │ 1                │
│ GatherND          │ 2              │ 2                │
│ Mul               │ 3              │ 3                │
│ NonMaxSuppression │ 1              │ 1                │
│ Reshape           │ 1              │ 1                │
│ ScatterND         │ 2              │ 2                │
│ Slice             │ 12             │ 12               │
│ Sub               │ 2              │ 2                │
│ Transpose         │ 1              │ 1                │
│ Model Size        │ 671.9KiB       │ 671.9KiB         │
└───────────────────┴────────────────┴──────────────────┘
Simplifying...
Finish! Here is the difference:
┏━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┓
┃                   ┃ Original Model ┃ Simplified Model ┃
┡━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━┩
│ Add               │ 3              │ 3                │
│ Cast              │ 2              │ 2                │
│ Concat            │ 2              │ 2                │
│ Constant          │ 20             │ 20               │
│ Div               │ 2              │ 2                │
│ Exp               │ 1              │ 1                │
│ Gather            │ 1              │ 1                │
│ GatherND          │ 2              │ 2                │
│ Mul               │ 3              │ 3                │
│ NonMaxSuppression │ 1              │ 1                │
│ Reshape           │ 1              │ 1                │
│ ScatterND         │ 2              │ 2                │
│ Slice             │ 12             │ 12               │
│ Sub               │ 2              │ 2                │
│ Transpose         │ 1              │ 1                │
│ Model Size        │ 671.9KiB       │ 671.9KiB         │
└───────────────────┴────────────────┴──────────────────┘
Simplifying...
Finish! Here is the difference:
┏━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┓
┃            ┃ Original Model ┃ Simplified Model ┃
┡━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━┩
│ Concat     │ 1              │ 1                │
│ Constant   │ 0              │ 0                │
│ Model Size │ 301.0B         │ 301.0B           │
└────────────┴────────────────┴──────────────────┘
INFO: MODEL_INDX=1: 16_nms_yolox_6300_merged.onnx, prefix="None"
INFO: MODEL_INDX=2: 17_nms_batchno_classid_x1y1x2y2_cat.onnx, prefix="None"
INFO: Finish!
Simplifying...
Finish! Here is the difference:
┏━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┓
┃                   ┃ Original Model ┃ Simplified Model ┃
┡━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━┩
│ Add               │ 3              │ 3                │
│ Cast              │ 2              │ 2                │
│ Concat            │ 3              │ 3                │
│ Constant          │ 20             │ 20               │
│ Div               │ 2              │ 2                │
│ Exp               │ 1              │ 1                │
│ Gather            │ 1              │ 1                │
│ GatherND          │ 2              │ 2                │
│ Mul               │ 3              │ 3                │
│ NonMaxSuppression │ 1              │ 1                │
│ Reshape           │ 1              │ 1                │
│ ScatterND         │ 2              │ 2                │
│ Slice             │ 12             │ 12               │
│ Sub               │ 2              │ 2                │
│ Transpose         │ 1              │ 1                │
│ Model Size        │ 672.0KiB       │ 672.0KiB         │
└───────────────────┴────────────────┴──────────────────┘
INFO: MODEL_INDX=1: yolox_s_wholebody12_0190_30x3x480x640.onnx, prefix="None"
INFO: MODEL_INDX=2: 18_nms_yolox_6300.onnx, prefix="None"
INFO: Finish!
Simplifying...
Finish! Here is the difference:
┏━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┓
┃                   ┃ Original Model ┃ Simplified Model ┃
┡━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━┩
│ Add               │ 10             │ 10               │
│ Cast              │ 2              │ 2                │
│ Concat            │ 21             │ 21               │
│ Constant          │ 194            │ 194              │
│ Conv              │ 83             │ 83               │
│ Div               │ 2              │ 2                │
│ Exp               │ 1              │ 1                │
│ Gather            │ 1              │ 1                │
│ GatherND          │ 2              │ 2                │
│ MaxPool           │ 3              │ 3                │
│ Mul               │ 77             │ 77               │
│ NonMaxSuppression │ 1              │ 1                │
│ Reshape           │ 4              │ 4                │
│ Resize            │ 2              │ 2                │
│ ScatterND         │ 2              │ 2                │
│ Sigmoid           │ 80             │ 80               │
│ Slice             │ 16             │ 16               │
│ Sub               │ 2              │ 2                │
│ Transpose         │ 2              │ 2                │
│ Model Size        │ 34.8MiB        │ 34.8MiB          │
└───────────────────┴────────────────┴──────────────────┘
Simplifying...
Finish! Here is the difference:
┏━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┓
┃                   ┃ Original Model ┃ Simplified Model ┃
┡━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━┩
│ Add               │ 10             │ 10               │
│ Cast              │ 2              │ 2                │
│ Concat            │ 21             │ 21               │
│ Constant          │ 194            │ 194              │
│ Conv              │ 83             │ 83               │
│ Div               │ 2              │ 2                │
│ Exp               │ 1              │ 1                │
│ Gather            │ 1              │ 1                │
│ GatherND          │ 2              │ 2                │
│ MaxPool           │ 3              │ 3                │
│ Mul               │ 77             │ 77               │
│ NonMaxSuppression │ 1              │ 1                │
│ Reshape           │ 4              │ 4                │
│ Resize            │ 2              │ 2                │
│ ScatterND         │ 2              │ 2                │
│ Sigmoid           │ 80             │ 80               │
│ Slice             │ 16             │ 16               │
│ Sub               │ 2              │ 2                │
│ Transpose         │ 2              │ 2                │
│ Model Size        │ 34.8MiB        │ 34.8MiB          │
└───────────────────┴────────────────┴──────────────────┘

URL or source code for simple inference testing code

the Script

I changed

LOWEROP=${OP,,} 
to
LOWEROP=$(echo "$OP" | tr '[:upper:]' '[:lower:]')

as bash was complaining

#!/bin/bash

#pip install -U pip && pip install onnxsim && pip install -U simple-onnx-processing-tools && pip install -U onnx && python3 -m pip install -U onnx_graphsurgeon --index-url https://pypi.ngc.nvidia.com  && pip install tensorflow==2.14.0

MODEL_NAME=yolox_s_wholebody12_0190
SUFFIX="30x3x"

OPSET=11
BATCHES=30
CLASSES=12

RESOLUTIONS=(

    "480 640 6300"

)

for((i=0; i<${#RESOLUTIONS[@]}; i++))
do
    RESOLUTION=(`echo ${RESOLUTIONS[i]}`)
    H=${RESOLUTION[0]}
    W=${RESOLUTION[1]}
    BOXES=${RESOLUTION[2]}

    ################################################### Grids
    python make_grids.py -o ${OPSET} -x ${BOXES} -c ${CLASSES} -ih ${H} -iw ${W}

    ################################################### Boxes + Scores
    python make_boxes_scores.py -o ${OPSET} -b ${BATCHES} -x ${BOXES} -c ${CLASSES}
    python make_cxcywh_y1x1y2x2.py -o ${OPSET} -b ${BATCHES} -x ${BOXES}

    snc4onnx \
    --input_onnx_file_paths 02_boxes_scores_${BOXES}.onnx 03_cxcywh_y1x1y2x2_${BOXES}.onnx \
    --srcop_destop boxes_cxcywh cxcywh \
    --op_prefixes_after_merging 02 03 \
    --output_onnx_file_path 04_boxes_x1y1x2y2_y1x1y2x2_scores_${BOXES}.onnx

    snc4onnx \
    --input_onnx_file_paths 01_grid_${BOXES}.onnx 04_boxes_x1y1x2y2_y1x1y2x2_scores_${BOXES}.onnx \
    --srcop_destop grid_output boxes_scores_input \
    --output_onnx_file_path 04_boxes_x1y1x2y2_y1x1y2x2_scores_${BOXES}.onnx

    ################################################### NonMaxSuppression
    sog4onnx \
    --op_type Constant \
    --opset ${OPSET} \
    --op_name max_output_boxes_per_class_const \
    --output_variables max_output_boxes_per_class int64 [1] \
    --attributes value int64 [20] \
    --output_onnx_file_path 05_Constant_max_output_boxes_per_class.onnx

    sog4onnx \
    --op_type Constant \
    --opset ${OPSET} \
    --op_name iou_threshold_const \
    --output_variables iou_threshold float32 [1] \
    --attributes value float32 [0.40] \
    --output_onnx_file_path 06_Constant_iou_threshold.onnx

    sog4onnx \
    --op_type Constant \
    --opset ${OPSET} \
    --op_name score_threshold_const \
    --output_variables score_threshold float32 [1] \
    --attributes value float32 [0.25] \
    --output_onnx_file_path 07_Constant_score_threshold.onnx

    OP=NonMaxSuppression
    LOWEROP=$(echo "$OP" | tr '[:upper:]' '[:lower:]')
    sog4onnx \
    --op_type ${OP} \
    --opset ${OPSET} \
    --op_name ${LOWEROP}${OPSET} \
    --input_variables boxes_var float32 [${BATCHES},${BOXES},4] \
    --input_variables scores_var float32 [${BATCHES},${CLASSES},${BOXES}] \
    --input_variables max_output_boxes_per_class_var int64 [1] \
    --input_variables iou_threshold_var float32 [1] \
    --input_variables score_threshold_var float32 [1] \
    --output_variables selected_indices int64 [\'N\',3] \
    --attributes center_point_box int64 0 \
    --output_onnx_file_path 08_${OP}${OPSET}.onnx

    snc4onnx \
    --input_onnx_file_paths 05_Constant_max_output_boxes_per_class.onnx 08_${OP}${OPSET}.onnx \
    --srcop_destop max_output_boxes_per_class max_output_boxes_per_class_var \
    --output_onnx_file_path 08_${OP}${OPSET}.onnx

    snc4onnx \
    --input_onnx_file_paths 06_Constant_iou_threshold.onnx 08_${OP}${OPSET}.onnx \
    --srcop_destop iou_threshold iou_threshold_var \
    --output_onnx_file_path 08_${OP}${OPSET}.onnx

    snc4onnx \
    --input_onnx_file_paths 07_Constant_score_threshold.onnx 08_${OP}${OPSET}.onnx \
    --srcop_destop score_threshold score_threshold_var \
    --output_onnx_file_path 08_${OP}${OPSET}.onnx

    # soc4onnx \
    # --input_onnx_file_path 08_${OP}${OPSET}.onnx \
    # --output_onnx_file_path 08_${OP}${OPSET}.onnx \
    # --opset ${OPSET}

    ################################################### Boxes + Scores + NonMaxSuppression
    snc4onnx \
    --input_onnx_file_paths 04_boxes_x1y1x2y2_y1x1y2x2_scores_${BOXES}.onnx 08_${OP}${OPSET}.onnx \
    --srcop_destop scores scores_var y1x1y2x2 boxes_var \
    --output_onnx_file_path 09_nms_yolox_${BOXES}.onnx

    ################################################### Myriad workaround Mul
    OP=Mul
    LOWEROP=$(echo "$OP" | tr '[:upper:]' '[:lower:]')
    OPSET=${OPSET}
    sog4onnx \
    --op_type ${OP} \
    --opset ${OPSET} \
    --op_name ${LOWEROP}${OPSET} \
    --input_variables workaround_mul_a int64 [\'N\',3] \
    --input_variables workaround_mul_b int64 [1] \
    --output_variables workaround_mul_out int64 [\'N\',3] \
    --output_onnx_file_path 10_${OP}${OPSET}_workaround.onnx

    ############ Myriad workaround Constant
    sog4onnx \
    --op_type Constant \
    --opset ${OPSET} \
    --op_name workaround_mul_const_op \
    --output_variables workaround_mul_const int64 [1] \
    --attributes value int64 [1] \
    --output_onnx_file_path 11_Constant_workaround_mul.onnx

    ############ Myriad workaround Mul + Myriad workaround Constant
    snc4onnx \
    --input_onnx_file_paths 11_Constant_workaround_mul.onnx 10_${OP}${OPSET}_workaround.onnx \
    --srcop_destop workaround_mul_const workaround_mul_b \
    --output_onnx_file_path 11_Constant_workaround_mul.onnx

    ################################################### NonMaxSuppression + Myriad workaround Mul
    snc4onnx \
    --input_onnx_file_paths 09_nms_yolox_${BOXES}.onnx 11_Constant_workaround_mul.onnx \
    --srcop_destop selected_indices workaround_mul_a \
    --output_onnx_file_path 09_nms_yolox_${BOXES}.onnx

    ################################################### Score GatherND
    python make_score_gather_nd.py -b ${BATCHES} -x ${BOXES} -c ${CLASSES}

    python -m tf2onnx.convert \
    --opset ${OPSET} \
    --tflite saved_model_postprocess/nms_score_gather_nd.tflite \
    --output 12_nms_score_gather_nd.onnx

    sor4onnx \
    --input_onnx_file_path 12_nms_score_gather_nd.onnx \
    --old_new ":0" "" \
    --search_mode "suffix_match" \
    --output_onnx_file_path 12_nms_score_gather_nd.onnx

    sor4onnx \
    --input_onnx_file_path 12_nms_score_gather_nd.onnx \
    --old_new "serving_default_input_1" "gn_scores" \
    --output_onnx_file_path 12_nms_score_gather_nd.onnx \
    --mode inputs

    sor4onnx \
    --input_onnx_file_path 12_nms_score_gather_nd.onnx \
    --old_new "serving_default_input_2" "gn_selected_indices" \
    --output_onnx_file_path 12_nms_score_gather_nd.onnx \
    --mode inputs

    sor4onnx \
    --input_onnx_file_path 12_nms_score_gather_nd.onnx \
    --old_new "PartitionedCall" "final_scores" \
    --output_onnx_file_path 12_nms_score_gather_nd.onnx \
    --mode outputs

    python make_input_output_shape_update.py \
    --input_onnx_file_path 12_nms_score_gather_nd.onnx \
    --output_onnx_file_path 12_nms_score_gather_nd.onnx \
    --input_names gn_scores \
    --input_names gn_selected_indices \
    --input_shapes ${BATCHES} ${CLASSES} ${BOXES} \
    --input_shapes N 3 \
    --output_names final_scores \
    --output_shapes N 1

    onnxsim 12_nms_score_gather_nd.onnx 12_nms_score_gather_nd.onnx
    onnxsim 12_nms_score_gather_nd.onnx 12_nms_score_gather_nd.onnx

    ################################################### NonMaxSuppression + Score GatherND
    snc4onnx \
    --input_onnx_file_paths 09_nms_yolox_${BOXES}.onnx 12_nms_score_gather_nd.onnx \
    --srcop_destop scores gn_scores workaround_mul_out gn_selected_indices \
    --output_onnx_file_path 09_nms_yolox_${BOXES}_nd.onnx

    onnxsim 09_nms_yolox_${BOXES}_nd.onnx 09_nms_yolox_${BOXES}_nd.onnx
    onnxsim 09_nms_yolox_${BOXES}_nd.onnx 09_nms_yolox_${BOXES}_nd.onnx

    ################################################### Final Batch Nums
    python make_final_batch_nums_final_class_nums_final_box_nums.py

    ################################################### Boxes GatherND
    python make_box_gather_nd.py

    python -m tf2onnx.convert \
    --opset ${OPSET} \
    --tflite saved_model_postprocess/nms_box_gather_nd.tflite \
    --output 14_nms_box_gather_nd.onnx

    sor4onnx \
    --input_onnx_file_path 14_nms_box_gather_nd.onnx \
    --old_new ":0" "" \
    --search_mode "suffix_match" \
    --output_onnx_file_path 14_nms_box_gather_nd.onnx

    sor4onnx \
    --input_onnx_file_path 14_nms_box_gather_nd.onnx \
    --old_new "serving_default_input_1" "gn_boxes" \
    --output_onnx_file_path 14_nms_box_gather_nd.onnx \
    --mode inputs

    sor4onnx \
    --input_onnx_file_path 14_nms_box_gather_nd.onnx \
    --old_new "serving_default_input_2" "gn_box_selected_indices" \
    --output_onnx_file_path 14_nms_box_gather_nd.onnx \
    --mode inputs

    sor4onnx \
    --input_onnx_file_path 14_nms_box_gather_nd.onnx \
    --old_new "PartitionedCall" "final_boxes" \
    --output_onnx_file_path 14_nms_box_gather_nd.onnx \
    --mode outputs

    python make_input_output_shape_update.py \
    --input_onnx_file_path 14_nms_box_gather_nd.onnx \
    --output_onnx_file_path 14_nms_box_gather_nd.onnx \
    --input_names gn_boxes \
    --input_names gn_box_selected_indices \
    --input_shapes ${BATCHES} ${BOXES} 4 \
    --input_shapes N 2 \
    --output_names final_boxes \
    --output_shapes N 4

    onnxsim 14_nms_box_gather_nd.onnx 14_nms_box_gather_nd.onnx
    onnxsim 14_nms_box_gather_nd.onnx 14_nms_box_gather_nd.onnx

    ################################################### nms_yolox_xxx_nd + nms_final_batch_nums_final_class_nums_final_box_nums
    snc4onnx \
    --input_onnx_file_paths 09_nms_yolox_${BOXES}_nd.onnx 13_nms_final_batch_nums_final_class_nums_final_box_nums.onnx \
    --srcop_destop workaround_mul_out bc_input \
    --op_prefixes_after_merging main01 sub01 \
    --output_onnx_file_path 15_nms_yolox_${BOXES}_split.onnx

    ################################################### nms_yolox_${BOXES}_split + nms_box_gather_nd
    snc4onnx \
    --input_onnx_file_paths 15_nms_yolox_${BOXES}_split.onnx 14_nms_box_gather_nd.onnx \
    --srcop_destop x1y1x2y2 gn_boxes final_box_nums gn_box_selected_indices \
    --output_onnx_file_path 16_nms_yolox_${BOXES}_merged.onnx

    onnxsim 16_nms_yolox_${BOXES}_merged.onnx 16_nms_yolox_${BOXES}_merged.onnx
    onnxsim 16_nms_yolox_${BOXES}_merged.onnx 16_nms_yolox_${BOXES}_merged.onnx

    ################################################### nms output merge
    python make_nms_outputs_merge.py

    onnxsim 17_nms_batchno_classid_x1y1x2y2_cat.onnx 17_nms_batchno_classid_x1y1x2y2_cat.onnx

    ################################################### merge
    snc4onnx \
    --input_onnx_file_paths 16_nms_yolox_${BOXES}_merged.onnx 17_nms_batchno_classid_x1y1x2y2_cat.onnx \
    --srcop_destop final_batch_nums cat_batch final_class_nums cat_classid final_scores cat_score final_boxes cat_x1y1x2y2 \
    --output_onnx_file_path 18_nms_yolox_${BOXES}.onnx

    onnxsim 18_nms_yolox_${BOXES}.onnx 18_nms_yolox_${BOXES}.onnx

    ################################################### yolox + Post-Process
    snc4onnx \
    --input_onnx_file_paths ${MODEL_NAME}_${SUFFIX}${H}x${W}.onnx 18_nms_yolox_${BOXES}.onnx \
    --srcop_destop output predictions \
    --output_onnx_file_path ${MODEL_NAME}_post_${SUFFIX}${H}x${W}.onnx
    onnxsim ${MODEL_NAME}_post_${SUFFIX}${H}x${W}.onnx ${MODEL_NAME}_post_${SUFFIX}${H}x${W}.onnx
    onnxsim ${MODEL_NAME}_post_${SUFFIX}${H}x${W}.onnx ${MODEL_NAME}_post_${SUFFIX}${H}x${W}.onnx

    ################################################### cleaning
    rm 0*_*.onnx
    rm 1*_*.onnx
done

PINTO0309 commented 3 weeks ago

Fix: https://github.com/PINTO0309/PINTO_model_zoo/pull/414

18_nms_yolox_6300.onnx.zip

sit4onnx -if 18_nms_yolox_6300.onnx -oep cpu

INFO: file: 18_nms_yolox_6300.onnx
INFO: providers: ['CPUExecutionProvider']
INFO: input_name.1: predictions shape: [30, 6300, 17] dtype: float32
INFO: test_loop_count: 10
INFO: total elapsed time:  420.7580089569092 ms
INFO: avg elapsed time per pred:  42.07580089569092 ms
INFO: output_name.1: batchno_classid_score_x1y1x2y2 shape: [7200, 7] dtype: float32

ozayr commented 3 weeks ago

getting

InvalidProtobuf: [ONNXRuntimeError] : 7 : INVALID_PROTOBUF : Load model from yolox_s_wholebody12_0190_post_30x3x480x640.onnx failed:Protobuf parsing failed.

PINTO0309 commented 3 weeks ago

Your model body just hasn't been converted to 30 batches.

onnxsim yolox_s_wholebody12_Nx3xHxW.onnx yolox_s_wholebody12_30x3x480x640.onnx \
--overwrite-input-shape "input:30,3,480,640"

ozayr commented 3 weeks ago

correct it works , appreciate very much.

the output comes out as Nx7

should it not be 30xNx7 ? ie a set of detections relating to each image

any idea why batch runs significantly more slower then running a single image , I just assumed it would be much faster

PINTO0309 commented 3 weeks ago

should it not be 30xNx7 ? ie a set of detections relating to each image

No. All batch processing results are included.

output: batchno_classid_score_x1y1x2y2 float32[N,7]

any idea why batch runs significantly more slower then running a single image

Seriously, read the README. If you don't like slow processing speed, use EfficientNMS-TRT. To begin with, there are too many boxes for output targets.

https://github.com/PINTO0309/PINTO_model_zoo/blob/main/449_YOLOX-WholeBody12/README.md#3-test

Post-Process

Because I add my own post-processing to the end of the model, which can be inferred by TensorRT, CUDA, and CPU, the benchmarked inference speed is the end-to-end processing speed including all pre-processing and post-processing. EfficientNMS in TensorRT is very slow and should be offloaded to the CPU.

NMS default parameter

param	value	note
max_output_boxes_per_class	20	Maximum number of outputs per class of one type. `20` indicates that the maximum number of people detected is `20`, the maximum number of heads detected is `20`, and the maximum number of hands detected is `20`. The larger the number, the more people can be detected, but the inference speed slows down slightly due to the larger overhead of NMS processing by the CPU. In addition, as the number of elements in the final output tensor increases, the amount of information transferred between hardware increases, resulting in higher transfer costs on the hardware circuit. Therefore, it would be desirable to set the numerical size to the minimum necessary.
iou_threshold	0.40	A value indicating the percentage of occlusion allowed for multiple bounding boxes of the same class. `0.40` is excluded from the detection results if, for example, two bounding boxes overlap in more than 41% of the area. The smaller the value, the more occlusion is tolerated, but over-detection may increase.
score_threshold	0.25	Bounding box confidence threshold. Specify in the range of `0.00` to `1.00`. The larger the value, the stricter the filtering and the lower the NMS processing load, but in exchange, all but bounding boxes with high confidence values are excluded from detection. This is a parameter that has a very large percentage impact on NMS overhead.

PINTO0309 / PINTO_model_zoo