Description

The current channels_last converter inserts a transpose node after the "flatten" layer to ensure the order is correct for the subsequent fully connected layer. This isn't strictly required and can be costly, e.g., for 2D convolutional networks this results in a transpose3d HLS function to be used, which is very expensive.

Additionally, in cases where input has only one channel a transpose isn't required. Technically one can get around this with inputs_channel_last=True but we've seen people expecting not to use this feature in case of a single channel.

This PR adds two more optimizers that run after the main channels_last optimizer to remove the transposes. This is more straightforward than to add special cases to the main optimizer to exclude insertion of Transpose layers.

Type of change

[x] New feature (non-breaking change which adds functionality) - Optimization to be precise

Tests

There's a new test called test_remove_transpose in test_pytorch_api.py that triggers this. Additionally, the removal of transpose after flatten is triggered by test_skipped_layers.

Checklist

[x] I have read the guidelines for contributing.
[x] I have commented my code, particularly in hard-to-understand areas.
[x] I have made corresponding changes to the documentation.
[x] My changes generate no new warnings.
[x] I have installed and run pre-commit on the files I edited or added.
[x] I have added tests that prove my fix is effective or that my feature works.

fastmachinelearning / hls4ml

Remove unnecessary transposes related to conversion to channels_last format #976

Description

Type of change

Tests

Checklist