fastmachinelearning / hls4ml

Machine learning on FPGAs using HLS
https://fastmachinelearning.org/hls4ml
Apache License 2.0
1.25k stars 407 forks source link

Memory issues when synthesizing conv2d models #134

Open signorgelato opened 5 years ago

signorgelato commented 5 years ago

@jmduarte We are trying to synthesize the example Conv2D model from hls4ml v0.1.4 in Vivado HLS 2017.2 for a different FPGA (xczu9eg-ffvb1156-2-i-es2) but we run into some memory issue. Here is the `keras-config.yml:

KerasJson: example-keras-model-files/KERAS_conv1d_small.json
KerasH5:   example-keras-model-files/KERAS_conv1d_small_weights.h5
OutputDir: my-hls-test
ProjectName: myproject
XilinxPart:  xczu9eg-ffvb1156-2-i-es2
ClockPeriod: 5

IOType: io_parallel # options: io_serial/io_parallel
ReuseFactor: 1
DefaultPrecision: ap_fixed<18,8>

and the command:

python keras-to-hls -c keras-config.yml
cd my-hls-test
vivado_hls -f build_prj.tcl

We run into this error:

ERROR: [XFORM 203-103] Array 'mult.V' (.../HLS4ML/nnet_utils/nnet_conv2d.h:122): partitioned elements number (1152) has exeeded the threshold (1024), which may cause long run-time.
ERROR: [HLS 200-70] Pre-synthesis failed.
command 'ap_source' returned error code
    while executing
"source [lindex $::argv 1] "
    ("uplevel" body line 1)
    invoked from within
"uplevel \#0 { source [lindex $::argv 1] } "

Is this only a memory issue? It seems like that there is a threshold of 1024 elements in Vivado 2017.2. How do we get around this limit?

benjaminkreis commented 5 years ago

Hello,

I just tried reproducing this problem with the same setup (Vivado 2017.2 and hls4ml v0.1.4), but I see different behavior. Everything runs okay until exporting the IP, where it then reports an error about not finding the part number. I wonder if this is a problem with my license?

INFO: [RTGEN 206-100] Finished creating RTL model for 'myproject'.
INFO: [HLS 200-111]  Elapsed time: 1.65 seconds; current allocated memory: 1.782 GB.
INFO: [RTMG 210-279] Implementing memory 'softmax_exp_table5_rom' using block ROMs.
INFO: [RTMG 210-279] Implementing memory 'softmax_invert_tabkb_rom' using block ROMs.
INFO: [HLS 200-111] Finished generating all RTL models Time (s): cpu = 00:02:28 ; elapsed = 00:02:24 . Memory (MB): peak = 2439.680 ; gain = 2013.992 ; free physical = 9245 ; free virtual = 166244
INFO: [SYSC 207-301] Generating SystemC RTL for myproject.
INFO: [VHDL 208-304] Generating VHDL RTL for myproject.
INFO: [VLOG 209-307] Generating Verilog RTL for myproject.

...

INFO: [COSIM 212-1000] *** C/RTL co-simulation finished: PASS ***
INFO: [COSIM 212-211] II is measurable only when transaction number is greater than 1 in RTL simulation. Otherwise, they will be marked as all NA. If user wants to calculate them, please make sure there are at least 2 transactions in RTL simulation.
INFO: [IMPL 213-8] Exporting RTL as an IP in IP-XACT.

...

INFO: [Common 17-206] Exiting Vivado at Sat Mar  2 12:34:02 2019...
ERROR: [Coretcl 2-106] Specified part could not be found.
INFO: [HLS 200-112] Total elapsed time: 210.09 seconds; peak allocated memory: 1.782 GB.

If I run with the default part number in hls4ml (xcku115-flvb2104-2-i), everything runs ok. Is anyone else able to reproduce the 1024 threshold error?

signorgelato commented 5 years ago

@benjaminkreis Recently, I have gotten it to run after I go inside <OutputDir>/firmware/parameters.h and change the parameters, in particular, just N_FILT_1 from 2 to 1. So how does this make sense? Maybe this is just specific to Vivado 2017.2 see Xilinx forum post and our FPGA.