Open thesps opened 3 years ago
@thesps Random question, but can't we have another hls implementation for beta, gamma, mean, and variance? Or would it be too hard?
Hi team, what is the hls4ml hardware implementation for Batchnormalization layer, and do you perform conv-batchnormalization fusion?
Hi @zhuangh. We implement the BatchNormalization like a linear transformation y = mx + c
.
m
and c
are derived from the 4 BatchNorm parameters here. The actual hardware implementation is here.
do you perform conv-batchnormalization fusion?
We do (and also Dense-BatchNormalization fusion). But at the moment, we don't fuse when the layers use quantizers, we only merge for post-training quantization. That said, if we had the QKeras implementation of QBatchNormalization
with quantized scale
and bias
, I think we could fuse QDense/QConv
with QBatchNormalization
and combine the quantizers too.
@Duchstf I guess we could, but I think this would be going backwards. And anyway, even in that case we would have to make some guesses about precisions, I think.
Minor clarification: we don't fuse Conv2D and BN at the moment, but we will add that functionality very soon
Now that QConv2D+BN are fused, I think I am seeing some related issues.
See this Gist: https://gist.github.com/jmduarte/5c9cf2607a25633246e6fc85cbca89de
If I instantiate a simple model with random weights between 0 and 2 and use a QActivation layer afterward with <16,4>:
Layer (type) Output Shape Param #
=================================================================
q_conv2d (QConv2D) (None, 3, 3, 4) 112
_________________________________________________________________
batch_normalization (BatchNo (None, 3, 3, 4) 16
_________________________________________________________________
q_activation (QActivation) (None, 3, 3, 4) 0
The first few QKeras outputs look like this
127.99219 6.9140625 127.99219 17.179688
while the first few hls4ml outputs look like this
60.08984375 6.83203125 -120.6953125 17.00390625
(clearly, the 1st and 3rd outputs saturate/wrap around here out of 4).
I think there are a few related issues all related to the different treatment of these layers between QKeras and hls4ml, which result in large accuracy differences. I'm wondering what you recommend as the "best practices" for QConv2D+(Q?)BatchNorm. Issues:
quantized_bits(wrap_around=True)
functionality (which would be great!) to make its behavior consistent with the Vivado/hls4ml ap_fixed behavior.ap_fixed<16,8,AP_RND,AP_SAT>
for all my layers so I can't make the hls4ml behavior consistent with the default QKeras behavior (but maybe this can be fixed?)For us (qkeras side), QConv2DBatchNorm is more numerical stable than QConv2D + QBatchNorm, and easy to verify against hardware implementation.
Since QKeras and hls4ml deal with saturation differently by default (hls4ml wraps around, while QKeras saturates), large errors accumulate very quickly.
curious, any other reason of using hls4ml AP_WRAP by default besides the cheapest hardware cost? AP_SAT should be more continuous and leads to better model performance, right?
Now that QConv2D+BN are fused, I think I am seeing some related issues.
Note that our optimization pass doesn't fuse Q\<Layer> + BN, so you should still see both layers in the model. @zhuangh recommends QConv2DBatchnorm (so do the fusion already in QKeras). Now that we have that layer supported in hls4ml (PR #298) you can use that instead to achieve the layer fusion.
With the fusion of layers in hls4ml, there's no way to use separate precisions for QConv2D and BatchNorm (which QKeras does by default, namely QConv2D can be highly quantized, while BatchNorm is usually left as floating point). After the fusion, the BatchNorm effectively uses the (usually highly reduced) precision of the QConv2D layer, which makes saturation more likely.
On this, did you do something to force the layer fusion? As implemented, the optimizer pass that does the layer fusion would skip that case because of the quantizers in QConv2D layer (here). And that's for the reason you mentioned, that the data type for the QConv2D weights and biases is not necessarily the right one for the fused QConv2DxBN weights and biases.
I think the right thing is always to use saturation for layer outputs especially at small bitwidths. Regarding this:
hls4ml for some reason doesn't seem to compile the model if I use ap_fixed<16,8,AP_RND,AP_SAT> for all my layers so I can't make the hls4ml behavior consistent with the default QKeras behavior (but maybe this can be fixed?)
Is it related to #305 ? (With solution in #309 )
And also generally, apart from for Activation layers, the output type of layers is not defined for us by QKeras, so there may still be a need to do some profiling & tracing to set the data types for layer outputs. While saturation mode helps, it's still going to be necessary to use enough bits to cover the full range of outputs to get good agreement with QKeras.
OK my mistake, QConv2D is not being merged with BatchNorm!
I will try to use the dedicated QConv2DBatchNorm layer.
I actually found that I can use ap_fixed<16,8,AP_SAT,AP_RND>
for the model just fine! My mistake was mixing up the order of the last two arguments!
No, it doesn't have to do with #305 and #309 (at least I merged #309 in my development branch for testing).
So I will update here once I look into these leads.
Speaking of #309, if it works for you, a review would be nice :wink:
For us (qkeras side), QConv2DBatchNorm is more numerical stable than QConv2D + QBatchNorm, and easy to verify against hardware implementation.
Since QKeras and hls4ml deal with saturation differently by default (hls4ml wraps around, while QKeras saturates), large errors accumulate very quickly.
curious, any other reason of using hls4ml AP_WRAP by default besides the cheapest hardware cost? AP_SAT should be more continuous and leads to better model performance, right?
The QKeras BatchNormalization [1] layer is not totally compatible with hls4ml, since it quantizes separately the
beta
,gamma
,mean
,variance
, while in hls4ml we combine these terms into ascale
andbias
(which are post-training quantized) [2]. Some combinations of the 4 parameter quantizations cannot really be converted into a meaningful quantization ofscale
orbias
, particularly when mixingpo2
and non-po2
quantizers. If we can implement a version ofQBatchNormalization
for training, withscale_quantizer
andbias_quantizer
, rather thanbeta_quantizer
,gamma_quantizer
, etc, it would be easier to propagate that info into hls4ml.I propose adding the layer implementation to
hls4ml
(as a utility? as part of a new 'training' module?) rather than QKeras since the need arises from our specific implementation of BatchNorm, rather than being something more widely useful.[1] https://github.com/google/qkeras/blob/master/qkeras/qnormalization.py [2] https://github.com/hls-fpga-machine-learning/hls4ml/blob/master/hls4ml/model/hls_layers.py#L745-L746