fastmachinelearning / hls4ml

Machine learning on FPGAs using HLS
https://fastmachinelearning.org/hls4ml
Apache License 2.0
1.16k stars 386 forks source link

Add an optimizer to replace SeparableConv by Depthwise + Conv (pointwise) #1022

Open jmitrevs opened 2 weeks ago

jmitrevs commented 2 weeks ago

Description

When looking at automatic type inference, reuse factor setting, stream buffer optimization, and eventual oneAPI implementation with task sequences, it became clear that treating separable convolutions as two layers instead of one was easier. The different layers can have different accumulator precisions, reuse factors, etc.

This optimizer converts a SeperableConv*D layer to a DepthwiseConv*D layer followed by a Conv*D layer for the pointwise convolution. (For backends that have an explicit pointwise implementation, a subsequent optimizer changes the Conv*D to PointwiseConv*D.) Layer-wise configurations are also created for the new depthwise and pointwise convolutions so that type inference can be done on the individual layers. Hence, this optimizer should be run before the automatic type inference. (The qonnx PR #979 adds a number of other optimizers than also need to run before the type inference, so this will be a common feature.)

In this PR I added parameters but did not remove any. In particular, reuse factor and accumulator type are ambiguous, and unused in the new implemenation, being split between depthwise and pointwise reuse factors and accumulators. However, if this optimizer is disabled, the old scheme should still work, with care by the user.

I believe this PR also adds support for multiplier factors other than 1, but it's untested. It was motivated by #1008 .

Type of change

Updated implementation that

Note: Please delete options that are not relevant.

Tests

This should not cause changes to the standard depth_multiplier=1 separable convolutions not using automatic type inference, so the default tests should be fine. The automatic type inference will be tested in a following PR that makes auto the default.

Checklist

jmitrevs commented 1 week ago

I am somewhat torn as to whether to remove the regular accum_t and reuse_factor, which are not used if the separable is split, which is the default.