fastmachinelearning / hls4ml

Machine learning on FPGAs using HLS
https://fastmachinelearning.org/hls4ml
Apache License 2.0
1.24k stars 401 forks source link

Remove unnecessary transposes related to conversion to channels_last format #976

Closed vloncar closed 5 months ago

vloncar commented 7 months ago

Description

The current channels_last converter inserts a transpose node after the "flatten" layer to ensure the order is correct for the subsequent fully connected layer. This isn't strictly required and can be costly, e.g., for 2D convolutional networks this results in a transpose3d HLS function to be used, which is very expensive.

Additionally, in cases where input has only one channel a transpose isn't required. Technically one can get around this with inputs_channel_last=True but we've seen people expecting not to use this feature in case of a single channel.

This PR adds two more optimizers that run after the main channels_last optimizer to remove the transposes. This is more straightforward than to add special cases to the main optimizer to exclude insertion of Transpose layers.

Type of change

Tests

There's a new test called test_remove_transpose in test_pytorch_api.py that triggers this. Additionally, the removal of transpose after flatten is triggered by test_skipped_layers.

Checklist