Xilinx / finn

Dataflow compiler for QNN inference on FPGAs
https://xilinx.github.io/finn
BSD 3-Clause "New" or "Revised" License
684 stars 219 forks source link

cleanup function and ConvertQONNXtoFINN transformation issue #892

Closed williamliao28 closed 8 months ago

williamliao28 commented 10 months ago

name: Bug report about: Something isn't working as expected title: 'cleanup function and ConvertQONNXtoFINN transformation issue' labels: bug assignees: ''


Prerequisites

Quick summary

FINN cannot successfully convert a QONNX model with transpose before concatenation layer.

Details

Any suggestions are welcomed. I have tried the following experiments for this issue.

  1. I export a model with transpose before concatenation layer to QONNX format using Brevitas. Then I try to convert the model in FINN by doing cleanup() and running ConvertQONNXtoFINN(). However, I get the following error message: onnx.onnx_cpp2py_export.shape_inference. Inferencerror: [ShapeInferenceError] (op_type: Cast, node name: Quant_4): [ShapeInferenceError] Inferred shape and existing shape differ in dimension 1: (64) vs (28). The traceback message is included in the screenshot posted below.
截圖 2023-09-10 01 01 40
  1. I have looked into the source code of the cleanup() function (in qonnx/util/cleanup.py) and the ConvertQONNXtoFINN() transformation (in finn/transformation/qonnx/convert_qonnx_to_finn.py). I found that they both call the transformation FoldTransposeIntoQuantInit() (line 40 in cleanup.py and line 81 in convert_qonnx_to_finn.py). This transformation attempts to fuse a transpose node into previous Quant or BipolarQuant node, and do a shape inference before returning the transformed model. It seems to assume that the transpose is 2D or does not involve channel dimension. When performing shape inference, it is expected that Quant or BipolarQuant nodes corresponding to quantized activation layers should have in_channel == out_channel. My test model has transpose involving channel dimension, so it fails.

Steps to Reproduce

I include example python codes in the links below which may be useful for reproducing the error. The file simple_test.py contains the PyTorch module defining the model with Brevitas quantization. The file simple_test_qonnx2finn.py is the code I run inside the FINN docker and this one produce the error message described above. The file simple_test_w1a1_cifar10_2epoch.onnx is a pertained model which can be used directly as the input for simple_test_qonnx2finn.py.

simple_test.py

simple_test_qonnx2finn.py

simple_test_w1a1_cifar10_2epoch.onnx

  1. Clone the FINN repository
  2. Checkout the dev branch
  3. Start the docker container with the command: bash ./run_docker.sh
  4. Put simple_test_w1a1_cifar10_2epoch.onnx under the directory build/quant_model/simple_test_cifar10/lr0.02
  5. Run python simple_test_qonnx2finn.py

Expected behavior

The code simple_test_qonnx2finn.py should be run without any errors and produce the converted model.

Actual behavior

Running python simple_test_qonnx2finn.py generates the error described above.

williamliao28 commented 10 months ago

@auphelia Could you help on this issue, please?

iksnagreb commented 9 months ago

Seems to be related to or even the same as one of the issues I am currently investigating: #878 (the third one from the list). However, I am not really making any progress on this particular one besides tracking it down to the FoldTransposeIntoQuantInit as well. I am stepping through the code and staring at graphs for some time now and am still not sure, whether I am violating assumptions (likely) or QONNX/FINN makes some wrong assumptions or, whether this is indeed a bug somewhere (maybe even in ONNX, not QONNX or FINN).

At least in my case, probably I just do not want to apply the transformation to the transpose at all (as it is part of the attention pattern). But for this, conditions for telling various uses of the transpose apart will be needed...

Did you make any progress by now @williamliao28? Otherwise, some input from you FINN people would be appreciated @auphelia @maltanar.

iksnagreb commented 9 months ago

For me it seems like currently all occurrences of the FoldTransposeIntoQuantInit are dealing with transpose nodes which are inserted by some other transforms like GemmToMatMul or the ChannelsLast/ChannelsFirst conversions. Is this true? In our cases, however, the transpose seems to be inherently part of the model.

fpjentzsch commented 8 months ago

Closing this as the issue was most likely fixed in https://github.com/fastmachinelearning/qonnx/pull/78.