Introduce support for generic elementwise binary operations

iksnagreb commented 2 months ago

This includes a set of HWCustomOp and HLSBackend operator templates which can be specialized in just a few lines of code to implement arbitrary elementwise binary operations, like Add, Mul, Sub, Div, And, Equal, etc., supporting multidirectional broadcasting. Concrete implementations for most of these operators according to standard ONNX is already sketched out. Still missing are specializations for accumulator and weight bit-width minimization and some tricky to implement operators. Also still missing is floating-point support due to HLS-backend limitations, though these seem to be just minor defects regarding flatten and Slice.

Adds unit tests in Python, C++ and RTL simulation for these new operators, though these are probably not exhaustive enough to validate all edge cases.

Proposes a new scheme for registering and importing custom operators into their corresponding module namespace, i.e., the 'custom_op' dictionary used to lookup operators by ONNX domain.

Implementation Progress

[x] Generic HWCustomOp and HLSBackend "templates" providing infrastructure and HLS code generation for arbitrary elementwise binary operations
[x] ElementwiseAdd
[x] ElementwiseSub
[x] ElementwiseMul
[x] ElementwiseDiv (Not covered by unit test due to difficulties avoiding zeros in test inputs...)
[ ] ElementwiseMod
[x] ElementwiseAnd
[x] ElementwiseOr
[x] ElementwiseXor
[x] ElementwiseEqual
[x] ElementwiseLess
[x] ElementwiseLessEqual
[x] ElementwiseGreater
[x] ElementwiseGreaterEqual
[x] ElementwiseBitwiseAnd
[x] ElementwiseBitwiseOr
[x] ElementwiseBitwiseXor
[ ] ElementwiseBitShift
[ ] ElementwisePow
[x] Unit tests for these new operators
[x] Integration test using QuantEltwiseAdd from Brevitas
[x] Infer... transformations in convert_to_hw_layers.py for these newly supported operators
[x] Accumulator and weight bit-width minimization for these new operators: At least the accumulator minimization needs to be specialized for each concrete instance of operator and probably cannot be handled by the template: Arithmetic and logical operators behave too different in this regard. Add, Sub, Mul etc. are probably already to different for a shared implementation.
[ ] Maybe add floating-point support, but not sure how this would integrate with the rest of FINN

fpjentzsch commented 2 months ago

Looks useful!

you define the op with the format (Identifier, Python, C++, RTL), do you have a use for the "RTL" definition or is this just a preparation for the future addition of an RTL backend?

I wonder how this functionality overlaps with the following existing FINN CustomOPs and if it makes any or all of them obsolete:

AddStreams (back-end = AddStreams_Batch from hlslib)
ChannelwiseOp (back-end = Thresholding_Batch from hlslib)
StreamingEltwise (back-end = StreamingEltwise from hlslib)
(AddStreamsLayer_Batch from hlslib, which doesn't seem to have a corresponding op in FINN)

iksnagreb commented 2 months ago

you define the op with the format (Identifier, Python, C++, RTL), do you have a use for the "RTL" definition or is this just a preparation for the future addition of an RTL backend?

Yes, it is just intended to make this future-proof, or at least to sketch out a potential future implementation of a generic RTL backend as well. But I am not even sure right now, whether I want to keep this format, as for example, for implementing the Mod and BitShift operator this simple string template is not sufficient, as the implementation depends on node attributes. Maybe I will fully transition this to the already present *_op property methods as these could contain some logic handling different node attributes. I am also not sure whether these backend-specific definitions are at the right place/level of the hierarchy here, on the other hand I want to minimize the number of customization points required to add new specializations of the operator template...

I wonder how this functionality overlaps with the following existing FINN CustomOPs and if it makes any or all of them obsolete:

Hm, it depends, I probably know more once I have sketched out the Infer... method for the new ones. Probably the old ones have much more strict assumptions and sometimes this could actually be intended to map to some special cases and get something more efficient. I intend to reach (almost) full ONNX compliance with the new operators, which could indeed render the AddStreams and StreamingEltwise obsolete - that is actually how I ended up implementing this in the first place: Initially I just wanted to add a new operator to handle element-wise operation (actually addition) of a constant tensor and a stream but though "why not generalize this?"... Regarding the ChannelwiseOp, I am not really sure what this one exactly does, but as it has the Thresholding_Batch as backend, it seems to do something slightly more elaborate than simple element-wise arithmetic or logical operations, so it will probably not be obsolete?

iksnagreb commented 2 months ago

This now relies on cherry-picked commits from https://github.com/Xilinx/finn/pull/901 and https://github.com/Xilinx/finn/pull/1030.

Xilinx / finn

Introduce support for generic elementwise binary operations #1040

Implementation Progress