Xilinx / finn

Dataflow compiler for QNN inference on FPGAs
https://xilinx.github.io/finn
BSD 3-Clause "New" or "Revised" License
681 stars 218 forks source link

Bugfix RTL Thresholding #1072

Closed auphelia closed 1 month ago

auphelia commented 1 month ago

This PR updates the test files related to RTL Thresholding and the code generation for narrow quantization. Narrow quantization means that the resulting values are symmetric around zero. For example an INT2 quantization that would usually map the inputs to [-2, -1, 0, 1], will for the narrow case omit the lowest value and instead map the inputs to [-1, 0, 1]. This mapping only requires 2 threshold values instead of 3. Since the RTL Thresholding expects 3 thresholds, we expand the threshold array by a dummy threshold (input_datatype.min()) and decrease the bias by one to achieve the same behaviour.