fastmachinelearning / hls4ml

Machine learning on FPGAs using HLS
https://fastmachinelearning.org/hls4ml
Apache License 2.0
1.22k stars 396 forks source link

Cloned HLS stream is read from twice. Second duplicate stream is left full. #707

Closed rfforelli closed 1 year ago

rfforelli commented 1 year ago

Quick summary

During the writing of an HLS project for a model with a skip connection, only one of the two copies of the stream are used in the first "branch". The original stream is propigated to the second "branch" instead of the second copy. This results in reading from a stream that is already empty. Commit hash: 107589fb1dac9aa898baea328acc6d519596e19d TF Model architecture

In addition tot this, the concatenate1d() function of nnet_merge_stream.h has a error in both loops where the out_data assignments don't have an additional offset to account for the progress of the outer loop. The fixed version is given below.

Details

The appropriate output stream is cloned with nnet::clone_stream<>() (layer13_out stream becomes layer71_cpy1 & layer71_cpy2). layer71_cpy1 is then correctly passed to the first "branch" but layer13_out (which has already been read from for the cloning process) is incorrectly passed to the second branch instead of layer71_cpy2. These warnings are produced as a result along with incorrect predictions.

WARNING: Hls::stream 'layer13_out' is read while empty, which may result in RTL simulation hanging.
WARNING: Hls::stream 'layer71_cpy2' contains leftover data, which may result in RTL simulation hanging.

Steps to Reproduce

Download & unzip files.zip

  1. conda env create -f environment.yml
  2. conda activate bepfm_fpga_temp
  3. Install main branch of hls4ml a. git clone https://github.com/fastmachinelearning/hls4ml.git b. cd hls4ml;pip install .;cd ..
  4. python3 compile_prj.py
  5. Navigate to sho_fitter_stream_rf8_TEMP/firmware/myproject.cpp

Expected behavior

layer13_out is cloned on line 173 of myproject.cpp to layer71_cpy1 and layer71_cpy2. layer71_cpy1 should be passed into the pooling1d_cl() on line 177 which works currently. layer71_cpy2 should be passed into the dense() on line 345.

Actual behavior

layer13_out is cloned on line 173 to layer71_cpy1 and layer71_cpy2. layer71_cpy1 should be passed into the pooling1d_cl() on line 177 which works currently. layer71_cpy2 should be passed into the dense() on line 345 but instead layer13_out is passed in which is incorrect.

files.zip

Possible fix

A possible fix for the concatenate1d function bug is below. Line 392 can be changed to: out_data[j + (i*input1_T::size)] = in_data1[j]; Line 400 can be changed to: out_data[j + (i*input2_T::size) + (CONFIG_T::n_elem1_0)] = in_data2[j];