Support for multi-packed DSP58s for VVUs - Githubissues

Xilinx / finn

Dataflow compiler for QNN inference on FPGAs

https://xilinx.github.io/finn

BSD 3-Clause "New" or "Revised" License

681 stars 218 forks source link

Support for multi-packed DSP58s for VVUs #975

Closed mmrahorovic closed 4 months ago

mmrahorovic commented 5 months ago

This PR is a clean-up of PR #907 where the Thresholding related changes have been excluded. Note: this PR depends on

Support for packed MV(A)Us: PR #794.
DWC RTL variant: PR #925.
[RTL SWG] Support SIMD < C in window-parallel mode: PR #922.

Adds support for utilizing multi-packed DSP58s for the HLS-based VVAU layer. For weights and activations that are between 4- and 8-bits wide (with the exception of 9-bits for activations for DSP58), the custom layer packs 2, 3 or 4 elements on the input datapath of the DSP to achieve multiple MACs per cycle per DSP58, which depends on the targeted board and quantization.

Functionalities to be added for the VVU

[ ] rtllib: RTL implementation for the DSP58-based VVU
- [ ] See mvu_vvu_8sx9_dsp58
[x] Custom-op for the new RTL component: see vectorvectoractivation_rtl.py
- [x] Code geneneration
- [x] IP-stitching
- [ ] Resource estimations
- [ ] Cycle estimations
[x] Transformation to instantiate the newly created custom-op: see specialize_to_rtl_layers.py. Note, this is part of PR #794.

Tests

[ ] Test for the VVU custom-op & transformation: test_fpgadataflow_vvau_rtl

Outstanding bugs & features

[x] Implement stream re-arrangement in the RTL-SWG followed by a regular (i.e. already supported) StreamingDataWidthConverter_Batch layer. This removes the need for introducing a new layer type. (resolved in PR #922)