Please make sure to check off these prerequisites before submitting a bug report.
[x] Test that the bug appears on the current version of the master branch. Make sure to include the commit hash of the commit you checked out.
[x] Check that the issue hasn't already been reported, by checking the currently open issues.
[x] If there are steps to reproduce the problem, make sure to write them down below.
[x] If relevant, please include the hls4ml project files, which were created directly before and/or after the bug.
Quick summary
Using Flatten layer causes several bugs on both Vivado and Quartus (WIP), including incorrect outputs and failed compilation. There are several possible solutions, so I am opening this for discussion so it hopefully gets resolved soon. The examples below are very minimal and artificial (would never be encountered in a real NN architecture), but should still produce results matching Keras, implying there is a clear logic mistake.
Steps to Reproduce
Clone the hls4ml repository
Checkout the master branch, with commit hash: 563c84c1ddfe5d8298a38ce2cdd74881ac8b955d
When using the set-up Conv2D -> ReLU -> Flatten -> ReLU only the first element in the output array is correct. All the other outputs are equal to the first and therefore, wrong. This is because nnet_activation_stream runs a single PackLoop with n_in / res_t::size, which is only correct when data_T::size == res_T::size, which is not the case after inserting the Flatten layer. Before the Flatten layer, the size of every pack in the stream is equal to the number of channels and there could be multiple packs; after the Flatten layer it is equal to the total number of elements and there is only one pack. On top of my head, there are two ways to address this:
Implement correct functionality of Repack layer - layer implementation currently exists but there is no optimizer pass inserting the layer when needed. This approach still seems wrong, since there is no reason for nnet_activation_stream.h to assume the packs are of equal size. In this case adding an assertion might be helpful.
Use the approach from Dense layer - Streaming dense first reads all of the values, performs the operation and writes the output, as seen below. This approach has been tested, but would require rewriting all of the activation and, might be wrong for multi-dimensional activations whose output depend on other elements in the array (e.g. Softmax - does a multi-dimensional softmax only depend on values in the same row/column or the entire matrix?)
Removing the first ReLU, i.e. when using the set-up Conv2D -> Flatten -> ReLU compilation fails (this error occurs in both parallel and stream), with the following exception:
firmware/myproject.cpp: In function ‘void myproject(hls::stream<nnet::array<ap_fixed<32, 9>, 1> >&, hls::stream<nnet::array<ap_fixed<32, 9>, 4> >&)’:
firmware/myproject.cpp:53:50: error: ‘layer3_out’ was not declared in this scope; did you mean ‘layer2_out’?
53 | nnet::relu<layer3_t, result_t, relu_config5>(layer3_out, layer5_out); // activation
| ^~~~~~
| layer2_out
Most likely this occurs because the `Reshape` layer was only partially removed from the `ModelGraph`. I wrote a quick optimizer to remove `Reshape` an this was resolved but this solution seems risky:
from hls4ml.model.layers import Reshape
from hls4ml.model.optimizer.optimizer import OptimizerPass
class SkipReshape(OptimizerPass):
def match(self, node):
return isinstance(node, Reshape)
Prerequisites
Please make sure to check off these prerequisites before submitting a bug report.
Quick summary
Using
Flatten
layer causes several bugs on both Vivado and Quartus (WIP), including incorrect outputs and failed compilation. There are several possible solutions, so I am opening this for discussion so it hopefully gets resolved soon. The examples below are very minimal and artificial (would never be encountered in a real NN architecture), but should still produce results matching Keras, implying there is a clear logic mistake.Steps to Reproduce
Expected behavior
Successful compilation and correct results
Actual behavior and Possible Fix
When using the set-up Conv2D -> ReLU -> Flatten -> ReLU only the first element in the output array is correct. All the other outputs are equal to the first and therefore, wrong. This is because
nnet_activation_stream
runs a singlePackLoop
withn_in / res_t::size
, which is only correct whendata_T::size == res_T::size
, which is not the case after inserting theFlatten
layer. Before theFlatten
layer, the size of every pack in the stream is equal to the number of channels and there could be multiple packs; after the Flatten layer it is equal to the total number of elements and there is only one pack. On top of my head, there are two ways to address this:Repack
layer - layer implementation currently exists but there is no optimizer pass inserting the layer when needed. This approach still seems wrong, since there is no reason fornnet_activation_stream.h
to assume the packs are of equal size. In this case adding an assertion might be helpful.Dense
layer - Streaming dense first reads all of the values, performs the operation and writes the output, as seen below. This approach has been tested, but would require rewriting all of the activation and, might be wrong for multi-dimensional activations whose output depend on other elements in the array (e.g. Softmax - does a multi-dimensional softmax only depend on values in the same row/column or the entire matrix?)typename res_T::value_type res[CONFIG_T::n_out];
pragma HLS ARRAY_PARTITION variable=res complete
DataPrepare: for(int i_in = 0; i_in < CONFIG_T::n_in / data_T::size; i_in++) { if (CONFIG_T::n_in / data_T::size > 1) {
pragma HLS PIPELINE
}
// Instead of doing Dense matrix multiplication, here we can do the Activation. doSomeActivation(data, res);
ResWrite: for(unsigned i_out = 0; i_out < CONFIG_T::n_out / res_T::size; i_out++) { if (CONFIG_T::n_out / res_T::size > 1) {
pragma HLS PIPELINE
}
firmware/myproject.cpp: In function ‘void myproject(hls::stream<nnet::array<ap_fixed<32, 9>, 1> >&, hls::stream<nnet::array<ap_fixed<32, 9>, 4> >&)’: firmware/myproject.cpp:53:50: error: ‘layer3_out’ was not declared in this scope; did you mean ‘layer2_out’? 53 | nnet::relu<layer3_t, result_t, relu_config5>(layer3_out, layer5_out); // activation | ^
~~~~~ | layer2_outfrom hls4ml.model.layers import Reshape from hls4ml.model.optimizer.optimizer import OptimizerPass
class SkipReshape(OptimizerPass): def match(self, node): return isinstance(node, Reshape)