fastmachinelearning / hls4ml

Machine learning on FPGAs using HLS
https://fastmachinelearning.org/hls4ml
Apache License 2.0
1.21k stars 393 forks source link

simulation,validation,speedup #147

Open nhanvtran opened 5 years ago

nhanvtran commented 5 years ago

speeding up simulation and a bunch of scripts for validation scans with many reuse factors

https://indico.cern.ch/event/823156/contributions/3442008/attachments/1850883/3038435/fastml_simulation.pdf

GiuseppeDiGuglielmo commented 5 years ago

We have a working branch here: https://github.com/hls-fpga-machine-learning/hls4ml/tree/gdg/validation

I discussed with @vloncar and he suggested few changes that I totally agree with and I will integrate soon on the branch.

GiuseppeDiGuglielmo commented 5 years ago

We have a consolidated validation flow. The basic idea is to generate two log files with the predictions after both the C simulation (csim_design) and the RTL co-simulation (cosim_design), respectively. These two files are then compared. Any difference is a symptom of a wrongly generated RTL implementation and must by investigated (maybe a too aggressive HLS configuration, etc.).

There are two implementations of this validation approach:

  1. The C/C++ testbench performs the log comparison (with an OS-dependent diff call). I report this for the sake of completeness.
  2. The TCL testbench performs the log comparison. This is neater and is the solution that @vloncar and I agreed on.

Few more notes:

The code on this branch also integrates the feature to load weights from files on simulation to avoid long compilation runs and to reduce the overall C-simulation and RTL-co-simulation times. In the master-branch implementation, the compilation of the headers which declare and define the weight and bias arrays could take a few hours in the case of large layers. In this branch implementation, we only declare those big arrays in the headers, we store the weights in TXT files, and we load the weights at the beginning of the simulation. To preserve the HLS flow we still need those weights/biases in the headers, thus we use conditional macros and keep the weights in the headers when we run HLS (csynth_design).

At this point, we may be ready for a PR.