fastmachinelearning / hls4ml

Machine learning on FPGAs using HLS
https://fastmachinelearning.org/hls4ml
Apache License 2.0
1.22k stars 396 forks source link

Streaming Activation incorrect shape & Reshape layer failed compilation #647

Open bo3z opened 2 years ago

bo3z commented 2 years ago

Prerequisites

Please make sure to check off these prerequisites before submitting a bug report.

Quick summary

Using Flatten layer causes several bugs on both Vivado and Quartus (WIP), including incorrect outputs and failed compilation. There are several possible solutions, so I am opening this for discussion so it hopefully gets resolved soon. The examples below are very minimal and artificial (would never be encountered in a real NN architecture), but should still produce results matching Keras, implying there is a clear logic mistake.

Steps to Reproduce

  1. Clone the hls4ml repository
  2. Checkout the master branch, with commit hash: 563c84c1ddfe5d8298a38ce2cdd74881ac8b955d
  3. Run conversion on model file with code:
import os
import shutil
import hls4ml
import numpy as np
from keras.models import Sequential
from keras.layers import Activation, Conv2D, Flatten

backend = 'Vivado'
io_type = 'io_stream'

input_shape = (4, 4, 1)    
X = np.random.rand(1, *input_shape)

keras_model = Sequential()
keras_model.add(Conv2D(1, (3, 3), input_shape=input_shape, kernel_initializer='ones', bias_initializer='zeros'))

# To see failed compilation, remove below ReLU
# With ReLU, results are incorrect
keras_model.add(Activation('relu'))

keras_model.add(Flatten())
keras_model.add(Activation('relu'))
keras_model.compile()

output_dir = r'example_flatten'
if os.path.isdir(output_dir):
    shutil.rmtree(output_dir)
os.makedirs(output_dir)

default_precision = 'ac_fixed<32, 9, true>' if backend == 'Quartus' else 'ap_fixed<32, 9>' 
hls_config = hls4ml.utils.config_from_keras_model(keras_model, granularity='name', default_precision=default_precision, default_reuse_factor=1)     
hls_config['Model']['Strategy'] = 'Resource'

hls_model = hls4ml.converters.convert_from_keras_model(
                        keras_model, 
                        hls_config=hls_config,
                        output_dir=output_dir, 
                        backend=backend,
                        io_type='io_stream')
hls_model.compile()

print('Predicting Keras')
keras_prediction = keras_model.predict(X)

print('Predicting HLS ' + backend + ' with GCC')
hls_prediction = hls_model.predict(np.ascontiguousarray(X))

print('Calculating error...')    
np.testing.assert_allclose(hls_prediction.flatten(), keras_prediction.flatten(), rtol=0.0, atol=3e-2)

Expected behavior

Successful compilation and correct results

Actual behavior and Possible Fix

When using the set-up Conv2D -> ReLU -> Flatten -> ReLU only the first element in the output array is correct. All the other outputs are equal to the first and therefore, wrong. This is because nnet_activation_stream runs a single PackLoop with n_in / res_t::size, which is only correct when data_T::size == res_T::size, which is not the case after inserting the Flatten layer. Before the Flatten layer, the size of every pack in the stream is equal to the number of channels and there could be multiple packs; after the Flatten layer it is equal to the total number of elements and there is only one pack. On top of my head, there are two ways to address this:

  1. Implement correct functionality of Repack layer - layer implementation currently exists but there is no optimizer pass inserting the layer when needed. This approach still seems wrong, since there is no reason for nnet_activation_stream.h to assume the packs are of equal size. In this case adding an assertion might be helpful.
  2. Use the approach from Dense layer - Streaming dense first reads all of the values, performs the operation and writes the output, as seen below. This approach has been tested, but would require rewriting all of the activation and, might be wrong for multi-dimensional activations whose output depend on other elements in the array (e.g. Softmax - does a multi-dimensional softmax only depend on values in the same row/column or the entire matrix?)
    
    typename data_T::value_type data[CONFIG_T::n_in];
    #pragma HLS ARRAY_PARTITION variable=data complete

typename res_T::value_type res[CONFIG_T::n_out];

pragma HLS ARRAY_PARTITION variable=res complete

DataPrepare: for(int i_in = 0; i_in < CONFIG_T::n_in / data_T::size; i_in++) { if (CONFIG_T::n_in / data_T::size > 1) {

pragma HLS PIPELINE

}
data_T data_pack = data_stream.read();
DataPack: for (int i_pack = 0; i_pack < data_T::size; i_pack++) {
    #pragma HLS UNROLL
    data[i_in * data_T::size + i_pack] = data_pack[i_pack];
}

}

// Instead of doing Dense matrix multiplication, here we can do the Activation. doSomeActivation(data, res);

ResWrite: for(unsigned i_out = 0; i_out < CONFIG_T::n_out / res_T::size; i_out++) { if (CONFIG_T::n_out / res_T::size > 1) {

pragma HLS PIPELINE

}
res_T res_pack;
#pragma HLS DATA_PACK variable=res_pack
ResPack: for (int i_pack = 0; i_pack < res_T::size; i_pack++) {
    #pragma HLS UNROLL
    res_pack[i_pack] = res[i_out * res_T::size + i_pack];
}
res_stream.write(res_pack);

}


Removing the first ReLU, i.e. when using the set-up Conv2D -> Flatten -> ReLU compilation fails (this error occurs in both parallel and stream), with the following exception:

firmware/myproject.cpp: In function ‘void myproject(hls::stream<nnet::array<ap_fixed<32, 9>, 1> >&, hls::stream<nnet::array<ap_fixed<32, 9>, 4> >&)’: firmware/myproject.cpp:53:50: error: ‘layer3_out’ was not declared in this scope; did you mean ‘layer2_out’? 53 | nnet::relu<layer3_t, result_t, relu_config5>(layer3_out, layer5_out); // activation | ^~~~~~ | layer2_out

Most likely this occurs because the `Reshape` layer was only partially removed from the `ModelGraph`. I wrote a quick optimizer to remove `Reshape` an this was resolved but this solution seems risky:

from hls4ml.model.layers import Reshape from hls4ml.model.optimizer.optimizer import OptimizerPass

class SkipReshape(OptimizerPass): def match(self, node): return isinstance(node, Reshape)

def transform(self, model, node):
    model.remove_node(node, rewire=True)
    return True