simulation,validation,speedup

nhanvtran commented 5 years ago

speeding up simulation and a bunch of scripts for validation scans with many reuse factors

https://indico.cern.ch/event/823156/contributions/3442008/attachments/1850883/3038435/fastml_simulation.pdf

GiuseppeDiGuglielmo commented 5 years ago

We have a working branch here: https://github.com/hls-fpga-machine-learning/hls4ml/tree/gdg/validation

I discussed with @vloncar and he suggested few changes that I totally agree with and I will integrate soon on the branch.

GiuseppeDiGuglielmo commented 5 years ago

We have a consolidated validation flow. The basic idea is to generate two log files with the predictions after both the C simulation (csim_design) and the RTL co-simulation (cosim_design), respectively. These two files are then compared. Any difference is a symptom of a wrongly generated RTL implementation and must by investigated (maybe a too aggressive HLS configuration, etc.).

There are two implementations of this validation approach:

The C/C++ testbench performs the log comparison (with an OS-dependent diff call). I report this for the sake of completeness.
The TCL testbench performs the log comparison. This is neater and is the solution that @vloncar and I agreed on.

Few more notes:

The first implementation, which embeds the validation in C/C++, is what Xilinx proposes in various guides (for example at page 33 - Self-Checking Test Bench). The latest versions of the Vivado Suite made commands like diff available on Windows too.
For the second implementation, we were counting on the compilation flag __RTL_SIMULATION__ to distinguish between C and RTL simulation. This flag should be automatically added by Vivado HLS, but unfortunately, it seems only available for SystemC/RTL co-simulation. @vloncar suggested this workaround, but we hope in a future fix from Xilinx. See also this forum post.

The code on this branch also integrates the feature to load weights from files on simulation to avoid long compilation runs and to reduce the overall C-simulation and RTL-co-simulation times. In the master-branch implementation, the compilation of the headers which declare and define the weight and bias arrays could take a few hours in the case of large layers. In this branch implementation, we only declare those big arrays in the headers, we store the weights in TXT files, and we load the weights at the beginning of the simulation. To preserve the HLS flow we still need those weights/biases in the headers, thus we use conditional macros and keep the weights in the headers when we run HLS (csynth_design).

At this point, we may be ready for a PR.

fastmachinelearning / hls4ml

simulation,validation,speedup #147