This PR is a clean-up of PR #907 where the Thresholding related changes have been excluded.
Note: this PR depends on
Support for packed MV(A)Us: PR #794.
DWC RTL variant: PR #925.
[RTL SWG] Support SIMD < C in window-parallel mode: PR #922.
Adds support for utilizing multi-packed DSP58s for the HLS-based VVAU layer. For weights and activations that are between 4- and 8-bits wide (with the exception of 9-bits for activations for DSP58), the custom layer packs 2, 3 or 4 elements on the input datapath of the DSP to achieve multiple MACs per cycle per DSP58, which depends on the targeted board and quantization.
Functionalities to be added for the VVU
[ ] rtllib: RTL implementation for the DSP58-based VVU
[ ] See mvu_vvu_8sx9_dsp58
[x] Custom-op for the new RTL component: see vectorvectoractivation_rtl.py
[x] Code geneneration
[x] IP-stitching
[ ] Resource estimations
[ ] Cycle estimations
[x] Transformation to instantiate the newly created custom-op: see specialize_to_rtl_layers.py. Note, this is part of PR #794.
Tests
[ ] Test for the VVU custom-op & transformation: test_fpgadataflow_vvau_rtl
Outstanding bugs & features
[x] Implement stream re-arrangement in the RTL-SWG followed by a regular (i.e. already supported) StreamingDataWidthConverter_Batch layer. This removes the need for introducing a new layer type. (resolved in PR #922)
This PR is a clean-up of PR #907 where the Thresholding related changes have been excluded. Note: this PR depends on
VVAU
layer. For weights and activations that are between 4- and 8-bits wide (with the exception of 9-bits for activations for DSP58), the custom layer packs 2, 3 or 4 elements on the input datapath of the DSP to achieve multiple MACs per cycle per DSP58, which depends on the targeted board and quantization.Functionalities to be added for the VVU
rtllib
: RTL implementation for the DSP58-based VVUmvu_vvu_8sx9_dsp58
vectorvectoractivation_rtl.py
specialize_to_rtl_layers.py
. Note, this is part of PR #794.Tests
test_fpgadataflow_vvau_rtl
Outstanding bugs & features
StreamingDataWidthConverter_Batch
layer. This removes the need for introducing a new layer type. (resolved in PR #922)