Xilinx / finn

Dataflow compiler for QNN inference on FPGAs
https://xilinx.github.io/finn
BSD 3-Clause "New" or "Revised" License
747 stars 238 forks source link

Caching support for long-running transformations #174

Open maltanar opened 4 years ago

maltanar commented 4 years ago

To speed up the compilation process for large models or large layers, it would make sense to have a caching mechanism for long-running transformations. The cached outputs would be persistent and get reused when appropriate when a transform is called. The cache generation/reuse should be optional.

The idea would be to generate a hash for all relevant input data for a transform for a particular node (including node attributes, parameter tensors, quantization annotations...) and use that hash as the cache key for the output products in a folder in a persistent location. Later on, if the same transformation is executed on a node with the same hash, the cached outputs can be reused by copying from the persistent cache folder into a new folder.

For NodeLocalTransform the hashing is relatively straightforward as only the node itself is passed to the transformation. For others, it's hard to generalize, and best considered on a case-by-case basis that gives the most execution time benefits for the use cases we have.

@quetric @Tobi-Alonso do you have any suggestions for which transforms to start with to get the most benefit, or any other comments?

Tobi-Alonso commented 4 years ago

For my test case using NUM_DEFAULT_WORKERS = 30 and StreamingFCLayer_Batch with "mem_mode" = "decoupled", these are the ones that take more time and I think are well suited for cache support:

I would start with PrepareIP and HLSSynthIP(needs cache support inPrepareIP).