Xilinx / finn

Dataflow compiler for QNN inference on FPGAs
https://xilinx.github.io/finn
BSD 3-Clause "New" or "Revised" License
723 stars 230 forks source link

ConCat layer Issue #954

Closed pkdeep closed 6 months ago

pkdeep commented 8 months ago

Hi @auphelia @williamliao28, I am facing this issue where Concat layer is not getting converted to HLS layer. I tried the transformation InferConCatLayer(), but still it dint infer streaming Concat layer. Can you help with this. I am attacing original and converted .onnx files simple_test_hls3 onnx

simple_test_hls3 onnx (2) new_parallel_512x8_w2a2 onnx Here is the original post https://github.com/Xilinx/finn/issues/853#issuecomment-1640254839

iksnagreb commented 8 months ago

There are scalar multiplications of floating-point type preceding the inputs to the Concat operator, turning the inputs into floating-point as well. Currently, the StreamingConcat does not support floating-point inputs, see here: https://github.com/Xilinx/finn/blob/dev/src/finn/transformation/fpgadataflow/convert_to_hls_layers.py#L1659

Unfortunately, this seems not to be easy to streamline, as the scale factors on each input branch are different. Streamlining the Mul nodes past the Concat layer only works out if they are all equal... It could work using some group-wise scale-factors afterwards, but I am not sure whether there is support for this.

Allowing the StreamingConcat operation to handle floating-point inputs is also not really desirable, as it would leave floating-points in the middle of your model graph (with finn you ideally want to have a continuous chain of integer-only operations). While it should be technically possible, this would require some refactoring of the corresponding HLS code generation.

pkdeep commented 8 months ago

Hi @iksnagreb . Thanks for the reply. I could get the Mul factors to be equal. Can this be processed to have a streaming replacement?

I am attaching both the pre-dataflow creation (after HLS) model image and after dataflow creation image. Any insight would be of great help to convert this into a fully streamlined model.

hls_layers_nw onnx dataflow_parent_nw onnx

iksnagreb commented 8 months ago

Hm, conceptually, you now would need something like the MoveIdenticalOpPastJoinOp transformation (this is in finn.transformation.streamline.reorder), but it seems to be made for two-input join-nodes, while you have an "arbitrary" many inputs join-node. I am not sure whether there already is a suitable streamlining transformation somewhere in finn, but adapting the existing MoveIdenticalOpPastJoinOp or introducing a new, similar one seems not too complicated.

auphelia commented 6 months ago

Closing this issue, thanks @iksnagreb for the suggestions. @pkdeep, if you have tried out to extend MoveIdenticalOpPastJoinOp and would like to contribute to the finn compiler, we're always welcoming contributions from the community 🙂 Please refer to the contribution guidelines in that case.

pkdeep commented 6 days ago

Hi, Sorry for the delayed response. I was able to move the uniform Multiplication past the Concat layer, although not by "MoveIdenticalOpPastJoinOp" but through some other technique. I removed the mul node, and inserted a new mul node after Concat and the put the exact same value of Param as the deleted Mul node. This helped me in able to convert the model in end-to-en streaming model (except the last Mul and Add node). Thanks To get uniform Mul node, I inserted "Quantidentity" in each of 4 branches. It gave me same multiplication factor in all 4 branches, but I am not clear why it happened? Can you help in understanding the reason?