This PR implements hardware normalisation for BatchNorm and LayerNorm (which by selecting the appropriate parameters, namely NUM_NORMALIZATION_ZONES, can also act as an InstanceNorm or GroupNorm).
Main Features
Adds a quantized software version of LayerNorm called LayerNormInteger.
Adds a quantized software version of BatchNorm1d called BatchNorm1dInteger.
Changed quantization passes to support converting from BatchNorm1d and LayerNorm to BatchNorm1dInteger and LayerNormInteger, respectively.
Added hardware modules fixed_batch_norm1d and fixed_layer_norm.
Added test benches for these hardware modules.
Added sqrt hardware module utilising CORDIC to calculate square-root operations efficiently at high clock frequencies. Used in the fixed_layer_norm hardware module.
Test benches
The test benches for LayerNorm and BatchNorm both utilise their quantized and non-quantized software versions for testing of functional equivalence. For LayerNorm, the test bench is passed if the values are within a certain range of the software versions.
The results from the hardware LayerNorm are usually within 1 fractional bit of the results produced by the functionally equivalent models. Try for yourself!
Auxillary Features
Added convert_parallelism hardware module for converting between different levels of parallelism. Very useful when dealing with multiple signals of varying parallelism.
Added StreamingMonitorRange for expecting a range of values, rather than just set values.
Added join_n hardware module for joining n hardware ready/valid handshakes together. Is general version of join2.
Overview
This PR implements hardware normalisation for
BatchNorm
andLayerNorm
(which by selecting the appropriate parameters, namelyNUM_NORMALIZATION_ZONES
, can also act as anInstanceNorm
orGroupNorm
).Main Features
LayerNorm
calledLayerNormInteger
.BatchNorm1d
calledBatchNorm1dInteger
.BatchNorm1d
andLayerNorm
toBatchNorm1dInteger
andLayerNormInteger
, respectively.fixed_batch_norm1d
andfixed_layer_norm
.sqrt
hardware module utilising CORDIC to calculate square-root operations efficiently at high clock frequencies. Used in thefixed_layer_norm
hardware module.Test benches
The test benches for
LayerNorm
andBatchNorm
both utilise their quantized and non-quantized software versions for testing of functional equivalence. ForLayerNorm
, the test bench is passed if the values are within a certain range of the software versions.The results from the hardware
LayerNorm
are usually within 1 fractional bit of the results produced by the functionally equivalent models. Try for yourself!Auxillary Features
convert_parallelism
hardware module for converting between different levels of parallelism. Very useful when dealing with multiple signals of varying parallelism.StreamingMonitorRange
for expecting a range of values, rather than just set values.join_n
hardware module for joining n hardware ready/valid handshakes together. Is general version ofjoin2
.