Closed Peize-Liu closed 1 year ago
Here is the ERRO INFO
[W:onnxruntime:, graph.cc:108 MergeShapeInfo] Error merging shape info for output. 'reference_output_disparity' source:{1,240,320,1} target:{4,240,320,1}. Falling back to lenient merge. model /home/khalil/workspace/onnx-modifier/modified_onnx/modified_model_float32.onnx info: input name: input, shape: [4, 2, 240, 320] output name: reference_output_disparity, shape: [1, 240, 320, 1]
@Peize-Liu Thanks for reporting. This issue is reproduced and I am looking into it.
@Peize-Liu This error info is invoked when the shape value "saved in ONNX metadata" and the shape value "in the runtime“ are inconsistent. In this case, the shape value saved in the ONNX metadata is [4, 2, 240, 320], but the shape in the runtime is [1, 240, 320, 1].
In your model, there is a slice op before the model output. Its starts value
is [0, 0, 0, 0] and its end value
is [1, 240, 320, 1]. It seems that the model will only output the inference result of the 1st batch, regardless of the input batch value. Is it an expected behavior?
@ZhangGe6 Acutally not, the oringal net is a stereo depth estimation net, which takes two [1 3 240 320] images and then output a depth image in dim [1 240 320 1 ], therefore, I think after changing the net into batch mode like [ 4 3 240 320 ] for input, the output should be [4 240 320 1]
@Peize-Liu Please remember to edit the end
value of the last slice op from [1, 240, 320, 1]
to [4, 240, 320, 1]
, after changing batch size to 4. Then the model can do inference without any errors or warnings. I think it is a design issue of the original model.
Thank you a lot! I will have a try
@ZhangGe6
Sorry for bothering you. I modified CREStereoNet via ONNX-Modifier from batch size 1 to 4. However, the modified seems to be wrong at the first Contact layer. I push models in this CREStereo Models. I'd appreciate if you have time to point out where the problem is.
the modified CREStereoNet seems to be wrong at the first Contact layer.
@Peize-Liu Got it. I'll look into it.
BTW, does the "hitnet" with batch size 4 work correctly?
the modified CREStereoNet seems to be wrong at the first Contact layer.
@Peize-Liu Got it. I'll look into it.
BTW, does the "hitnet" with batch size 4 work correctly?
Yes, It works, thank you very much for your advice
@Peize-Liu Hi, I figured it out. It is a bug in the code and has been fixed. Please update to the latest code and have a try. Thanks for reporting!
This is a brief explanation for the bug: As the previous change batch size
function is implemented by replacing the batch size meta-data of all the nodes with the same value, It can't work correctly when a transformation on batch dim is invoked, which is exactly the 1st Contact node in CREStereoNet does.
In the latest code, the change batch size
function is implemented using shape inference, rather than the previous hard-coded way, and the issue is expected to be fixed. Feel free for more discussions if any problem still exists.
@Peize-Liu Hi, I figured it out. It is a bug in the code and has been fixed. Please update to the latest code and have a try. Thanks for reporting!
This is a brief explanation for the bug: As the previous
change batch size
function is implemented by replacing the batch size meta-data of all the nodes with the same value, It can't work correctly when a transformation on batch dim is invoked, which is exactly the 1st Contact node in CREStereoNet does.In the latest code, the
change batch size
function is implemented using shape inference, rather than the previous hard-coded way, and the issue is expected to be fixed. Feel free for more discussions if any problem still exists.
Thank you every much for your efforts, I feel that there is still an issue with the ouput dim? Should I locally fix this or it should be done with onnx-modifer, the expect output dim should be [4 2 240 320] I guess I have tested the modified model with tensorrt, it can be excuted properly but output dim
@Peize-Liu Similar to "hitnet", there are also ops that are configured for batch size 1 exclusively. For example, after changing batch size to 4, we need to edit the split
value of op init_Split_115
from 1, 1
to 4, 4
.
However, to make the ONNX model compatible with batch size 4, there may be still a long way to go, as there are other ops that are configured for batch size 1, and the model is very complex. In this case, It would be more efficient if you could export an ONNX model for batch size 4, rather than exporting an ONNX model for batch size 1 and then editing it.
@Peize-Liu Similar to "hitnet", there are also ops that are configured for batch size 1 exclusively. For example, after changing batch size to 4, we need to edit the
split
value of opinit_Split_115
from1, 1
to4, 4
.However, to make the ONNX model compatible with batch size 4, there may be still a long way to go, as there are other ops that are configured for batch size 1, and the model is very complex. In this case, It would be more efficient if you could export an ONNX model for batch size 4, rather than exporting an ONNX model for batch size 1 and then editing it.
Exactly, thank you every much for your work and project. It realy saves time for researchers who are not familiar with machine learning area. Thank you again for your great job, I have learnt a lot from this issue.
Dear team,
I want to express my deep appreciation for your outstanding work. I recently made modifications to an ONNX model obtained from the ONNX ZOO hitnet repository, which can be found at this link. The original input shape of the model is [1, 2, 320, 240], but I have made changes to enable batch processing with a shape of [4, 2, 320, 240]. I successfully applied these modifications using the onnx-modifier tool.
However, I have encountered an issue with the output when calling the ONNX model using the ONNX Runtime API. Although the output displayed in onnx-modifier appears to be correct, the output remains unaltered when calling the model through the ONNX Runtime API.
onnx_model.zip