Prerequisites

Please make sure to check off these prerequisites before submitting a bug report.

[x] Test that the bug appears on the current version of the master branch. Make sure to include the commit hash of the commit you checked out.
[x] Check that the issue hasn't already been reported, by checking the currently open issues.
[x] If there are steps to reproduce the problem, make sure to write them down below.
[x] If relevant, please include the hls4ml project files, which were created directly before and/or after the bug.

Quick summary

Using Flatten layer causes several bugs on both Vivado and Quartus (WIP), including incorrect outputs and failed compilation. There are several possible solutions, so I am opening this for discussion so it hopefully gets resolved soon. The examples below are very minimal and artificial (would never be encountered in a real NN architecture), but should still produce results matching Keras, implying there is a clear logic mistake.

Steps to Reproduce

Clone the hls4ml repository
Checkout the master branch, with commit hash: 563c84c1ddfe5d8298a38ce2cdd74881ac8b955d
Run conversion on model file with code:

import os
import shutil
import hls4ml
import numpy as np
from keras.models import Sequential
from keras.layers import Activation, Conv2D, Flatten

backend = 'Vivado'
io_type = 'io_stream'

input_shape = (4, 4, 1)    
X = np.random.rand(1, *input_shape)

keras_model = Sequential()
keras_model.add(Conv2D(1, (3, 3), input_shape=input_shape, kernel_initializer='ones', bias_initializer='zeros'))

# To see failed compilation, remove below ReLU
# With ReLU, results are incorrect
keras_model.add(Activation('relu'))

keras_model.add(Flatten())
keras_model.add(Activation('relu'))
keras_model.compile()

output_dir = r'example_flatten'
if os.path.isdir(output_dir):
    shutil.rmtree(output_dir)
os.makedirs(output_dir)

default_precision = 'ac_fixed<32, 9, true>' if backend == 'Quartus' else 'ap_fixed<32, 9>' 
hls_config = hls4ml.utils.config_from_keras_model(keras_model, granularity='name', default_precision=default_precision, default_reuse_factor=1)     
hls_config['Model']['Strategy'] = 'Resource'

hls_model = hls4ml.converters.convert_from_keras_model(
                        keras_model, 
                        hls_config=hls_config,
                        output_dir=output_dir, 
                        backend=backend,
                        io_type='io_stream')
hls_model.compile()

print('Predicting Keras')
keras_prediction = keras_model.predict(X)

print('Predicting HLS ' + backend + ' with GCC')
hls_prediction = hls_model.predict(np.ascontiguousarray(X))

print('Calculating error...')    
np.testing.assert_allclose(hls_prediction.flatten(), keras_prediction.flatten(), rtol=0.0, atol=3e-2)

Expected behavior

Successful compilation and correct results

Actual behavior and Possible Fix

When using the set-up Conv2D -> ReLU -> Flatten -> ReLU only the first element in the output array is correct. All the other outputs are equal to the first and therefore, wrong. This is because nnet_activation_stream runs a single PackLoop with n_in / res_t::size, which is only correct when data_T::size == res_T::size, which is not the case after inserting the Flatten layer. Before the Flatten layer, the size of every pack in the stream is equal to the number of channels and there could be multiple packs; after the Flatten layer it is equal to the total number of elements and there is only one pack. On top of my head, there are two ways to address this:

Implement correct functionality of Repack layer - layer implementation currently exists but there is no optimizer pass inserting the layer when needed. This approach still seems wrong, since there is no reason for nnet_activation_stream.h to assume the packs are of equal size. In this case adding an assertion might be helpful.
Use the approach from Dense layer - Streaming dense first reads all of the values, performs the operation and writes the output, as seen below. This approach has been tested, but would require rewriting all of the activation and, might be wrong for multi-dimensional activations whose output depend on other elements in the array (e.g. Softmax - does a multi-dimensional softmax only depend on values in the same row/column or the entire matrix?)
```
typename data_T::value_type data[CONFIG_T::n_in];
#pragma HLS ARRAY_PARTITION variable=data complete
```

typename res_T::value_type res[CONFIG_T::n_out];

pragma HLS ARRAY_PARTITION variable=res complete

DataPrepare: for(int i_in = 0; i_in < CONFIG_T::n_in / data_T::size; i_in++) { if (CONFIG_T::n_in / data_T::size > 1) {

pragma HLS PIPELINE

}
data_T data_pack = data_stream.read();
DataPack: for (int i_pack = 0; i_pack < data_T::size; i_pack++) {
    #pragma HLS UNROLL
    data[i_in * data_T::size + i_pack] = data_pack[i_pack];
}

}

// Instead of doing Dense matrix multiplication, here we can do the Activation. doSomeActivation(data, res);

ResWrite: for(unsigned i_out = 0; i_out < CONFIG_T::n_out / res_T::size; i_out++) { if (CONFIG_T::n_out / res_T::size > 1) {

pragma HLS PIPELINE

}
res_T res_pack;
#pragma HLS DATA_PACK variable=res_pack
ResPack: for (int i_pack = 0; i_pack < res_T::size; i_pack++) {
    #pragma HLS UNROLL
    res_pack[i_pack] = res[i_out * res_T::size + i_pack];
}
res_stream.write(res_pack);

}


Removing the first ReLU, i.e. when using the set-up Conv2D -> Flatten -> ReLU compilation fails (this error occurs in both parallel and stream), with the following exception:

firmware/myproject.cpp: In function ‘void myproject(hls::stream<nnet::array<ap_fixed<32, 9>, 1> >&, hls::stream<nnet::array<ap_fixed<32, 9>, 4> >&)’: firmware/myproject.cpp:53:50: error: ‘layer3_out’ was not declared in this scope; did you mean ‘layer2_out’? 53 | nnet::relu<layer3_t, result_t, relu_config5>(layer3_out, layer5_out); // activation | ^~~~~~ | layer2_out

Most likely this occurs because the `Reshape` layer was only partially removed from the `ModelGraph`. I wrote a quick optimizer to remove `Reshape` an this was resolved but this solution seems risky:

from hls4ml.model.layers import Reshape from hls4ml.model.optimizer.optimizer import OptimizerPass

class SkipReshape(OptimizerPass): def match(self, node): return isinstance(node, Reshape)

def transform(self, model, node):
    model.remove_node(node, rewire=True)
    return True

fastmachinelearning / hls4ml

Streaming Activation incorrect shape & Reshape layer failed compilation #647

Prerequisites

Quick summary

Steps to Reproduce

Expected behavior

Actual behavior and Possible Fix

pragma HLS ARRAY_PARTITION variable=res complete

pragma HLS PIPELINE

pragma HLS PIPELINE