fastmachinelearning / hls4ml

Machine learning on FPGAs using HLS
https://fastmachinelearning.org/hls4ml
Apache License 2.0
1.18k stars 389 forks source link

Feature Request: hls4ml-specific QBatchNormalization layer #225

Open thesps opened 3 years ago

thesps commented 3 years ago

The QKeras BatchNormalization [1] layer is not totally compatible with hls4ml, since it quantizes separately the beta, gamma, mean, variance, while in hls4ml we combine these terms into a scale and bias (which are post-training quantized) [2]. Some combinations of the 4 parameter quantizations cannot really be converted into a meaningful quantization of scale or bias, particularly when mixing po2 and non-po2 quantizers. If we can implement a version of QBatchNormalization for training, with scale_quantizer and bias_quantizer, rather than beta_quantizer, gamma_quantizer, etc, it would be easier to propagate that info into hls4ml.

I propose adding the layer implementation to hls4ml (as a utility? as part of a new 'training' module?) rather than QKeras since the need arises from our specific implementation of BatchNorm, rather than being something more widely useful.

[1] https://github.com/google/qkeras/blob/master/qkeras/qnormalization.py [2] https://github.com/hls-fpga-machine-learning/hls4ml/blob/master/hls4ml/model/hls_layers.py#L745-L746

Duchstf commented 3 years ago

@thesps Random question, but can't we have another hls implementation for beta, gamma, mean, and variance? Or would it be too hard?

zhuangh commented 3 years ago

Hi team, what is the hls4ml hardware implementation for Batchnormalization layer, and do you perform conv-batchnormalization fusion?

thesps commented 3 years ago

Hi @zhuangh. We implement the BatchNormalization like a linear transformation y = mx + c. m and c are derived from the 4 BatchNorm parameters here. The actual hardware implementation is here.

do you perform conv-batchnormalization fusion?

We do (and also Dense-BatchNormalization fusion). But at the moment, we don't fuse when the layers use quantizers, we only merge for post-training quantization. That said, if we had the QKeras implementation of QBatchNormalization with quantized scale and bias, I think we could fuse QDense/QConv with QBatchNormalization and combine the quantizers too.

@Duchstf I guess we could, but I think this would be going backwards. And anyway, even in that case we would have to make some guesses about precisions, I think.

vloncar commented 3 years ago

Minor clarification: we don't fuse Conv2D and BN at the moment, but we will add that functionality very soon

jmduarte commented 3 years ago

Now that QConv2D+BN are fused, I think I am seeing some related issues.

See this Gist: https://gist.github.com/jmduarte/5c9cf2607a25633246e6fc85cbca89de

If I instantiate a simple model with random weights between 0 and 2 and use a QActivation layer afterward with <16,4>:

Layer (type)                 Output Shape              Param #   
=================================================================
q_conv2d (QConv2D)           (None, 3, 3, 4)           112       
_________________________________________________________________
batch_normalization (BatchNo (None, 3, 3, 4)           16        
_________________________________________________________________
q_activation (QActivation)   (None, 3, 3, 4)           0         

The first few QKeras outputs look like this

127.99219     6.9140625 127.99219    17.179688

while the first few hls4ml outputs look like this

60.08984375    6.83203125 -120.6953125    17.00390625   

(clearly, the 1st and 3rd outputs saturate/wrap around here out of 4).

I think there are a few related issues all related to the different treatment of these layers between QKeras and hls4ml, which result in large accuracy differences. I'm wondering what you recommend as the "best practices" for QConv2D+(Q?)BatchNorm. Issues:

zhuangh commented 3 years ago

For us (qkeras side), QConv2DBatchNorm is more numerical stable than QConv2D + QBatchNorm, and easy to verify against hardware implementation.

Since QKeras and hls4ml deal with saturation differently by default (hls4ml wraps around, while QKeras saturates), large errors accumulate very quickly.

curious, any other reason of using hls4ml AP_WRAP by default besides the cheapest hardware cost? AP_SAT should be more continuous and leads to better model performance, right?

thesps commented 3 years ago

Now that QConv2D+BN are fused, I think I am seeing some related issues.

Note that our optimization pass doesn't fuse Q\<Layer> + BN, so you should still see both layers in the model. @zhuangh recommends QConv2DBatchnorm (so do the fusion already in QKeras). Now that we have that layer supported in hls4ml (PR #298) you can use that instead to achieve the layer fusion.

With the fusion of layers in hls4ml, there's no way to use separate precisions for QConv2D and BatchNorm (which QKeras does by default, namely QConv2D can be highly quantized, while BatchNorm is usually left as floating point). After the fusion, the BatchNorm effectively uses the (usually highly reduced) precision of the QConv2D layer, which makes saturation more likely.

On this, did you do something to force the layer fusion? As implemented, the optimizer pass that does the layer fusion would skip that case because of the quantizers in QConv2D layer (here). And that's for the reason you mentioned, that the data type for the QConv2D weights and biases is not necessarily the right one for the fused QConv2DxBN weights and biases.

I think the right thing is always to use saturation for layer outputs especially at small bitwidths. Regarding this:

hls4ml for some reason doesn't seem to compile the model if I use ap_fixed<16,8,AP_RND,AP_SAT> for all my layers so I can't make the hls4ml behavior consistent with the default QKeras behavior (but maybe this can be fixed?)

Is it related to #305 ? (With solution in #309 )

And also generally, apart from for Activation layers, the output type of layers is not defined for us by QKeras, so there may still be a need to do some profiling & tracing to set the data types for layer outputs. While saturation mode helps, it's still going to be necessary to use enough bits to cover the full range of outputs to get good agreement with QKeras.

jmduarte commented 3 years ago

OK my mistake, QConv2D is not being merged with BatchNorm!

I will try to use the dedicated QConv2DBatchNorm layer.

I actually found that I can use ap_fixed<16,8,AP_SAT,AP_RND> for the model just fine! My mistake was mixing up the order of the last two arguments!

No, it doesn't have to do with #305 and #309 (at least I merged #309 in my development branch for testing).

So I will update here once I look into these leads.

vloncar commented 3 years ago

Speaking of #309, if it works for you, a review would be nice :wink:

jmduarte commented 3 years ago

For us (qkeras side), QConv2DBatchNorm is more numerical stable than QConv2D + QBatchNorm, and easy to verify against hardware implementation.

Since QKeras and hls4ml deal with saturation differently by default (hls4ml wraps around, while QKeras saturates), large errors accumulate very quickly.

curious, any other reason of using hls4ml AP_WRAP by default besides the cheapest hardware cost? AP_SAT should be more continuous and leads to better model performance, right?