daquexian / onnx-simplifier

Simplify your onnx model
Apache License 2.0
3.82k stars 381 forks source link

[BUG] ONNX simplifier crashes on MLPerf model #231

Open gyenesvi opened 2 years ago

gyenesvi commented 2 years ago

Describe the bug

I am trying to run ONNX simplifier on a model from MLPerf (retinanet-resnext50 800x800). However, the simplifier crashes with a segmentation fault, seems like a null pointer dereference somewhere.

Model

Here is the model available from the MLPerf GitHub page (https://github.com/mlcommons/inference/tree/master/vision/classification_and_detection):

https://zenodo.org/record/6617879/files/resnext50_32x4d_fpn.onnx

gyenesvi commented 2 years ago

It seems that the error is in the shape inference somewhere, maybe because this graph has dynamic shapes after the NonMaxSuppression operation. Weird though that if I just load the graph and run shape inference, then there is no crash. Maybe the shape inference crashes after some transformation steps.

gyenesvi commented 2 years ago

Any feedback on this @daquexian? The simplifier actually crashes on other MLPerf detection models as well. I tried with older versions of ONNX simplifier too, but got the same result.

daquexian commented 2 years ago

Sorry I have a deadline at Oct. 28th and after the deadline I can try to fix the issue 😉

gyenesvi commented 2 years ago

Sure, thank you!

gyenesvi commented 1 year ago

@daquexian, any progress on this? Is a new version in progress as I read in some comments? I have 0.4.10 now and it still crashes.

daquexian commented 1 year ago

@gyenesvi Sorry for the late response. I tested the model with the latest onnxsim and it succeeded. Could you please try once again? Thanks!

gyenesvi commented 1 year ago

@daquexian Thanks for the response and the update. As I started testing, it was weirdly still crashing on my side. What's even more weird, it was now even crashing on simpler models that used to work! So I started experimenting with Python versions and onnx versions, and it turns out that it works with the latest version of onnx (1.13), but it crashes with previous onnx version 1.12 that I had installed. Now on one hand it's great that it is finally working, but that's not really great behavior, it should definitely not be crashing with older versions of onnx package. What could cause that? To start debugging, I guess you could just test it with onnx version 1.12 to see if it happens for you as well?

skewer commented 1 year ago

I've got a similar problem here with the same model file. I'm using MacOS 13.2.1, and the package version is

onnxruntime       1.14.1
onnxsim           0.4.19
onnx              1.12.0/1.13.0/1.13.1

If I use onnx 1.13.0/1.13.1, the error message is:

Simplifying...
Traceback (most recent call last):
  File "/Users/*/miniconda3/envs/onnxshape/bin/onnxsim", line 8, in <module>
    sys.exit(main())
  File "/Users/*/miniconda3/envs/onnxshape/lib/python3.9/site-packages/onnxsim/onnx_simplifier.py", line 434, in main
    model_opt, check_ok = simplify(
  File "/Users/*/miniconda3/envs/onnxshape/lib/python3.9/site-packages/onnxsim/onnx_simplifier.py", line 186, in simplify
    model_opt_bytes = C.simplify(
RuntimeError: /Users/runner/work/onnx-simplifier/onnx-simplifier/third_party/onnx-optimizer/third_party/onnx/onnx/common/ir.h:527: input: Assertion `inputs_.size() == 1` failed.

If I use onnx 1.12.0, the error message is

Simplifying...
[1]    31348 segmentation fault  onnxsim /Users/*/Downloads/resnext50_32x4d_fpn.onnx sim_fpn.onnx