Open jmitrevs opened 2 years ago
Thanks for flagging this Jovan. I had a quick look at the testcase, and actually it looks like the problem is not from inside finn-base
but rather onnx.shape_inference.infer_shapes
which we use under the hood to do shape inference for non-custom ops. I was able to reproduce the same problem in a way that sidesteps finn-base completely:
In [1]: from onnx.shape_inference import infer_shapes
In [2]: import onnx
In [3]: ret0=onnx.load("MLP.onnx")
In [4]: ret1=infer_shapes(ret0)
In [5]: onnx.save(ret1, "mlp-with-shapes.onnx")
...and examining mlp-with-shapes.onnx in Netron I can confirm that the shapes are missing. The good news is, by upgrading to onnx==1.11.0
I was able to get the right shape inference behavior, so this must be some bug that has been fixed in recent versions.
I'll re-run the test-suite with onnx==1.11.0
and if it doesn't break anything, I'll push a fix for this to finn-base
and qonnx
repos.
It doesn't seem to solve the problem on my mac. I updated onnx versions but still have the problem:
(fastml) mac-137349:Downloads jmitrevs$ qonnx-cleanup MLP.onnx
(fastml) mac-137349:Downloads jmitrevs$ qonnx-exec MLP_clean.onnx
Traceback (most recent call last):
File "/Users/jmitrevs/fastml/bin/qonnx-exec", line 33, in <module>
sys.exit(load_entry_point('qonnx', 'console_scripts', 'qonnx-exec')())
File "/Users/jmitrevs/work/qonnx/src/qonnx/util/exec_qonnx.py", line 43, in main
clize.run(exec_qonnx)
File "/Users/jmitrevs/fastml/lib/python3.9/site-packages/sigtools/modifiers.py", line 158, in __call__
return self.func(*args, **kwargs)
File "/Users/jmitrevs/fastml/lib/python3.9/site-packages/clize/runner.py", line 363, in run
ret = cli(*args)
File "/Users/jmitrevs/fastml/lib/python3.9/site-packages/clize/runner.py", line 220, in __call__
return func(*posargs, **kwargs)
File "/Users/jmitrevs/work/qonnx/src/qonnx/util/exec_qonnx.py", line 35, in exec_qonnx
odict = execute_onnx(model, idict)
File "/Users/jmitrevs/work/finn-base/src/finn/core/onnx_exec.py", line 147, in execute_onnx
raise Exception("Found unspecified tensor shapes, try infer_shapes")
Exception: Found unspecified tensor shapes, try infer_shapes
(fastml) mac-137349:Downloads jmitrevs$ pip list | grep onnx
onnx 1.11.0
onnxconverter-common 1.8.1
onnxruntime 1.11.1
qonnx 0.0.post1.dev104+gc86147e.d20220531 /Users/jmitrevs/work/qonnx/src
tf2onnx 1.10.0 /Users/jmitrevs/work/tensorflow-onnx
I had only used Netron to check that the shapes appeared for the intermediate tensors, but if I use qonnx-exec
I actually see the same problem. The root of this seems to be as follows: even though the weight&bias tensors for the Gemm
nodes have initializers, there is no ValueInfo
generated for these tensors during shape inference. Since we rely on ValueInfo
to get shape information, the Found unspecified tensor shapes
exception is thrown during execution.
It looks like this issue has been around for a while and is related to initializers not being listed as inputs: https://github.com/onnx/onnx/issues/4102 https://github.com/onnx/onnx/issues/2874 ...but the following merged PR was supposed to fix this for 1.11.0 and later: https://github.com/onnx/onnx/pull/2901
I'm not entirely sure why the fix hasn't kicked in here. I'll have a closer look.
I haven't been able to find out why the ONNX PR#2901 does not solve this issue, so I just added a workaround in ModelWrapper
to do a fix for this while loading the model.
Since the finn-base
is scheduled to be sunset, I did this directly in a new qonnx
branch:
https://github.com/fastmachinelearning/qonnx/tree/feature/finn_base_migration
@jmitrevs could you give this a try and see if it resolves the issue for you? I was able to use qonnx-cleanup
and qonnx_exec
without errors on the MLP.onnx you shared.
I believe it fixed the problem. I am now running into another problem, but I think it's unrelated. (I will double-check this afternoon.)
I confirmed, my script now works (after fixing an unrelated bug).
@maltanar What's the fix for this issue if using the latest finn-base dev branch? (I tried building a docker with onnx>=1.11.0 but it didn't fix the issue)
If I create an onnx file with this sample script and input.txt:
(the produced ONNX file is available at: https://drive.google.com/file/d/1wt6ub3cChvPD-XM4-7keuTy5dC5wdVZk/view?usp=sharing)
it seems that
infer_shapes
from the cleaning fails:The problem is that
model.get_tensor_shape('Gemm_0_param0')
returns[]
. I do not understand the behavior.