Does FINN support prunning?

Xilinx / finn

Dataflow compiler for QNN inference on FPGAs

https://xilinx.github.io/finn

BSD 3-Clause "New" or "Revised" License

714 stars 226 forks source link

Does FINN support prunning? #792

Closed jurevreca12 closed 1 year ago

jurevreca12 commented 1 year ago

I have searched the documentation, the code and the issues, but I can't find any mention of support for pruned neural networks. So my question is, does FINN support pruning? If not, is it on the roadmap? If it does, is there an example somewhere?

Motivation

Pruning is a useful technique to further decrease the size of a neural network, and thus would potentially enable FINN to deploy networks with fewer resources.

Pruning is supported by hls4ml and can be achieved quite easily with the tensorflow model optimization toolkit.

auphelia commented 1 year ago

Hi @jurevreca12, On the training side pruning is easy to do in Brevitas (e.g. with the PyTorch pruning toolkit). To take advantage in hardware, there are currently two options:

It either needs to be processed in the ML framework (Brevitas) by doing channel pruning and removing the resulting 0-valued channels before exporting it to QONNX
or in FINN you can leverage (also unstructured) sparsity if you implement the network fully unrolled (PE and SIMD set to max). Because multiplications with zero get optimized away during synthesis. If the network is not fully unrolled, we can currently not leverage sparsity in FINN.

jurevreca12 commented 1 year ago

Hey @auphelia, thank you for this information. Just to make sure I've understand correctly. I need to set PE to the number of neurons in a layer, and SIMD to the number of inputs. Correct?

auphelia commented 1 year ago

In the case of the MVAU for example this means: PE=matrix height and SIMD=matrix width. Here is some additional information about folding: https://github.com/Xilinx/finn/blob/github-pages/docs/finn-sheduling-and-folding.pptx