fastmachinelearning / hls4ml

Machine learning on FPGAs using HLS
https://fastmachinelearning.org/hls4ml
Apache License 2.0
1.25k stars 407 forks source link

Pre-synthesis failed - Stop unrolling loop 'MultLoop' #374

Open pipawoz opened 3 years ago

pipawoz commented 3 years ago

Hello. Im trying to sintetize a QKeras model for an autoencoder in a Pynq-Z2. The model compiles, but then when building it stays like 2 hours in "Starting code transformations" and then it fails. Dont know why, probably the model is too big?

The idea is to have vibration data, shaped in segments of 1024 data points and then classify it in different states/failures.

Code

def create_model(): # 
    """
    Input: Train Data:(3680, 1024), Train Labels: (3680, 10), Test Data: (920, 1024), Test Labels: (920, 10)
    Layers: 512(Relu)/100(Relu)/10(Softmax)
    """

    k_inic = 'glorot_uniform'
    n_bits = 6

    model = Sequential()

    model.add(QDense(512, kernel_quantizer=quantized_bits(n_bits,0,alpha=1), 
                     bias_quantizer=quantized_bits(n_bits,0,alpha=1),
                     kernel_initializer=k_inic, 
                     kernel_regularizer=l1(0.0001),
                     name="Dense1"))
    model.add(QActivation(activation=quantized_relu(n_bits), name='Relu1'))

    model.add(QDense(100, kernel_quantizer=quantized_bits(n_bits,0,alpha=1), 
                     bias_quantizer=quantized_bits(n_bits,0,alpha=1),
                     kernel_initializer=k_inic, 
                     kernel_regularizer=l1(0.0001),
                     name="Dense2"))
    model.add(QActivation(activation=quantized_relu(n_bits), name='Relu2'))

    model.add(QDense(10, kernel_quantizer=quantized_bits(n_bits,0,alpha=1), 
                     bias_quantizer=quantized_bits(n_bits,0,alpha=1),
                     kernel_initializer=k_inic, 
                     kernel_regularizer=l1(0.0001),
                     name="Out1"))

    model.add(Activation(activation='softmax', name='softmax'))
    opt = keras.optimizers.Adam()

    model.compile(loss='mse', optimizer=opt, metrics=["accuracy"])
    model.build(input_shape=(None, 1024))
    return model

# Model dont have zeros so prunning doesnt do anything.
# Model Fit/Evaluate step

#  Based on the examples/tutorials
hls4ml.model.optimizer.OutputRoundingSaturationMode.layers = ['Activation']
hls4ml.model.optimizer.OutputRoundingSaturationMode.rounding_mode = 'AP_RND'
hls4ml.model.optimizer.OutputRoundingSaturationMode.saturation_mode = 'AP_SAT'

config = hls4ml.utils.config_from_keras_model(model, granularity='name')
config['Model'] = {}
config['Model']['ReuseFactor'] = 64
config['Model']['Strategy'] = 'Resource'
config['Model']['Precision'] = 'ap_fixed<16,6>'
config['LayerName']['softmax']['exp_table_t'] = 'ap_fixed<18,8>'
config['LayerName']['softmax']['inv_table_t'] = 'ap_fixed<18,4>'
config['LayerName']['softmax']['Strategy'] = 'Stable'

hls_model = hls4ml.converters.convert_from_keras_model(model,
                                                       hls_config=config,
                                                       output_dir='model_3/hls4ml_prj',
                                                       fpga_part='xc7z020clg400-1')  # Pynq-Z2
hls_model.compile()
hls_model.build(csim=False,synth=True,export=True)
hls4ml.report.read_vivado_report('model_3/hls4ml_prj')

Log

****** Vivado(TM) HLS - High-Level Synthesis from C, C++ and SystemC v2019.2 (64-bit)
  **** SW Build 2708876 on Wed Nov  6 21:39:14 MST 2019
  **** IP Build 2700528 on Thu Nov  7 00:09:20 MST 2019
    ** Copyright 1986-2019 Xilinx, Inc. All Rights Reserved.

source /mnt/shared/Vivado/2019.2/scripts/vivado_hls/hls.tcl -notrace
INFO: [HLS 200-10] Running '/mnt/shared/Vivado/2019.2/bin/unwrapped/lnx64.o/vivado_hls'
INFO: [HLS 200-10] For user 'final-project' on host 'final-project-GL552VX' (Linux_x86_64 version 5.11.0-25-generic) on Sat Aug 07 18:18:11 -03 2021
INFO: [HLS 200-10] On os Ubuntu 20.04.2 LTS
INFO: [HLS 200-10] In directory '/mnt/shared/machine_learning/model_3/hls4ml_prj'
Sourcing Tcl script 'build_prj.tcl'
INFO: [HLS 200-10] Opening project '/mnt/shared/machine_learning/model_3/hls4ml_prj/myproject_prj'.
INFO: [HLS 200-10] Adding design file 'firmware/myproject.cpp' to the project
INFO: [HLS 200-10] Adding test bench file 'myproject_test.cpp' to the project
INFO: [HLS 200-10] Adding test bench file 'firmware/weights' to the project
INFO: [HLS 200-10] Adding test bench file 'tb_data' to the project
INFO: [HLS 200-10] Opening solution '/mnt/shared/machine_learning/model_3/hls4ml_prj/myproject_prj/solution1'.
INFO: [SYN 201-201] Setting up clock 'default' with a period of 5ns.
INFO: [HLS 200-10] Setting target device to 'xc7z020-clg400-1'
INFO: [XFORM 203-101] Allowed max sub elements number after partition is 4096.
INFO: [XFORM 203-1161] The maximum of name length is set into 60.
INFO: [XFORM 203-101] Allowed max sub elements number after partition is 4096.
INFO: [XFORM 203-1161] The maximum of name length is set into 60.
***** C/RTL SYNTHESIS *****
INFO: [SCHED 204-61] Option 'relax_ii_for_timing' is enabled, will increase II to preserve clock frequency constraints.
INFO: [HLS 200-10] Analyzing design file 'firmware/myproject.cpp' ... 
WARNING: [HLS 214-114] Since the only kind of statements allowed in a canonical dataflow region are variable declarations and function calls, the compiler may not be able to correctly handle the region: firmware/nnet_utils/nnet_dense_latency.h:64:9
WARNING: [HLS 214-114] Since the only kind of statements allowed in a canonical dataflow region are variable declarations and function calls, the compiler may not be able to correctly handle the region: firmware/nnet_utils/nnet_dense_latency.h:79:2
WARNING: [HLS 214-104] Only for-loops and functions support the dataflow: firmware/nnet_utils/nnet_dense_latency.h:76:9
WARNING: [HLS 214-113] Either use an argument of the function or declare the variable inside the dataflow loop body: firmware/myproject.cpp:63:68
WARNING: [HLS 214-113] Either use an argument of the function or declare the variable inside the dataflow loop body: firmware/myproject.cpp:63:72
WARNING: [HLS 214-113] Either use an argument of the function or declare the variable inside the dataflow loop body: firmware/myproject.cpp:73:67
WARNING: [HLS 214-113] Either use an argument of the function or declare the variable inside the dataflow loop body: firmware/myproject.cpp:73:71
WARNING: [HLS 214-113] Either use an argument of the function or declare the variable inside the dataflow loop body: firmware/myproject.cpp:83:67
WARNING: [HLS 214-113] Either use an argument of the function or declare the variable inside the dataflow loop body: firmware/myproject.cpp:83:71
WARNING: [HLS 214-114] Since the only kind of statements allowed in a canonical dataflow region are variable declarations and function calls, the compiler may not be able to correctly handle the region: firmware/myproject.cpp:37:2
WARNING: [HLS 214-114] Since the only kind of statements allowed in a canonical dataflow region are variable declarations and function calls, the compiler may not be able to correctly handle the region: firmware/myproject.cpp:38:5
WARNING: [HLS 200-471] Dataflow form checks found 11 issue(s) in file firmware/myproject.cpp
INFO: [HLS 200-111] Finished Linking Time (s): cpu = 00:05:02 ; elapsed = 00:05:32 . Memory (MB): peak = 930.453 ; gain = 523.035 ; free physical = 6596 ; free virtual = 13193
INFO: [HLS 200-111] Finished Checking Pragmas Time (s): cpu = 00:05:02 ; elapsed = 00:05:32 . Memory (MB): peak = 930.453 ; gain = 523.035 ; free physical = 6596 ; free virtual = 13193
INFO: [HLS 200-10] Starting code transformations ...

INFO: [XFORM 203-603] Inlining function 'nnet::product::mult<ap_fixed<16, 6, (ap_q_mode)5, (ap_o_mode)3, 0>, ap_fixed<8, 1, (ap_q_mode)5, (ap_o_mode)3, 0>, ap_fixed<16, 6, (ap_q_mode)5, (ap_o_mode)3, 0> >::product' into 'nnet::dense_resource_rf_leq_nin<ap_fixed<16, 6, (ap_q_mode)5, (ap_o_mode)3, 0>, ap_fixed<16, 6, (ap_q_mode)5, (ap_o_mode)3, 0>, config2>' (firmware/nnet_utils/nnet_dense_resource.h:76).
INFO: [XFORM 203-603] Inlining function 'nnet::dense_resource_rf_leq_nin<ap_fixed<16, 6, (ap_q_mode)5, (ap_o_mode)3, 0>, ap_fixed<16, 6, (ap_q_mode)5, (ap_o_mode)3, 0>, config2>' into 'nnet::dense_resource<ap_fixed<16, 6, (ap_q_mode)5, (ap_o_mode)3, 0>, ap_fixed<16, 6, (ap_q_mode)5, (ap_o_mode)3, 0>, config2>' (firmware/nnet_utils/nnet_dense_resource.h:274).
INFO: [XFORM 203-603] Inlining function 'nnet::dense<ap_fixed<16, 6, (ap_q_mode)5, (ap_o_mode)3, 0>, ap_fixed<16, 6, (ap_q_mode)5, (ap_o_mode)3, 0>, config2>' into 'myproject' (firmware/myproject.cpp:63).
INFO: [XFORM 203-603] Inlining function 'nnet::product::mult<ap_fixed<8, 1, (ap_q_mode)0, (ap_o_mode)0, 0>, ap_fixed<8, 1, (ap_q_mode)5, (ap_o_mode)3, 0>, ap_fixed<16, 6, (ap_q_mode)5, (ap_o_mode)3, 0> >::product' into 'nnet::dense_resource_rf_leq_nin<ap_fixed<8, 1, (ap_q_mode)0, (ap_o_mode)0, 0>, ap_fixed<16, 6, (ap_q_mode)5, (ap_o_mode)3, 0>, config5>' (firmware/nnet_utils/nnet_dense_resource.h:76).
INFO: [XFORM 203-603] Inlining function 'nnet::product::mult<ap_fixed<8, 1, (ap_q_mode)0, (ap_o_mode)0, 0>, ap_fixed<8, 1, (ap_q_mode)5, (ap_o_mode)3, 0>, ap_fixed<16, 6, (ap_q_mode)5, (ap_o_mode)3, 0> >::product' into 'nnet::dense_resource_rf_leq_nin<ap_fixed<8, 1, (ap_q_mode)0, (ap_o_mode)0, 0>, ap_fixed<16, 6, (ap_q_mode)5, (ap_o_mode)3, 0>, config8>' (firmware/nnet_utils/nnet_dense_resource.h:76).
INFO: [XFORM 203-603] Inlining function 'nnet::dense_resource_rf_leq_nin<ap_fixed<8, 1, (ap_q_mode)0, (ap_o_mode)0, 0>, ap_fixed<16, 6, (ap_q_mode)5, (ap_o_mode)3, 0>, config5>' into 'nnet::dense_resource<ap_fixed<8, 1, (ap_q_mode)0, (ap_o_mode)0, 0>, ap_fixed<16, 6, (ap_q_mode)5, (ap_o_mode)3, 0>, config5>' (firmware/nnet_utils/nnet_dense_resource.h:274).
INFO: [XFORM 203-603] Inlining function 'nnet::dense<ap_fixed<8, 1, (ap_q_mode)0, (ap_o_mode)0, 0>, ap_fixed<16, 6, (ap_q_mode)5, (ap_o_mode)3, 0>, config5>' into 'myproject' (firmware/myproject.cpp:73).
INFO: [XFORM 203-603] Inlining function 'nnet::dense_resource_rf_leq_nin<ap_fixed<8, 1, (ap_q_mode)0, (ap_o_mode)0, 0>, ap_fixed<16, 6, (ap_q_mode)5, (ap_o_mode)3, 0>, config8>' into 'nnet::dense_resource<ap_fixed<8, 1, (ap_q_mode)0, (ap_o_mode)0, 0>, ap_fixed<16, 6, (ap_q_mode)5, (ap_o_mode)3, 0>, config8>' (firmware/nnet_utils/nnet_dense_resource.h:274).
INFO: [XFORM 203-603] Inlining function 'nnet::dense<ap_fixed<8, 1, (ap_q_mode)0, (ap_o_mode)0, 0>, ap_fixed<16, 6, (ap_q_mode)5, (ap_o_mode)3, 0>, config8>' into 'myproject' (firmware/myproject.cpp:83).
INFO: [HLS 200-111] Finished Standard Transforms Time (s): cpu = 02:23:23 ; elapsed = 02:32:49 . Memory (MB): peak = 3074.469 ; gain = 2667.051 ; free physical = 1766 ; free virtual = 9322
INFO: [HLS 200-10] Checking synthesizability ...
INFO: [XFORM 203-602] Inlining function 'nnet::cast<ap_fixed<16, 6, (ap_q_mode)5, (ap_o_mode)3, 0>, ap_fixed<16, 6, (ap_q_mode)5, (ap_o_mode)3, 0>, config2>' into 'nnet::dense_resource<ap_fixed<16, 6, (ap_q_mode)5, (ap_o_mode)3, 0>, ap_fixed<16, 6, (ap_q_mode)5, (ap_o_mode)3, 0>, config2>' (firmware/nnet_utils/nnet_dense_resource.h:99->firmware/nnet_utils/nnet_dense_resource.h:274) automatically.
INFO: [XFORM 203-602] Inlining function 'nnet::cast<ap_fixed<8, 1, (ap_q_mode)0, (ap_o_mode)0, 0>, ap_fixed<16, 6, (ap_q_mode)5, (ap_o_mode)3, 0>, config5>' into 'nnet::dense_resource<ap_fixed<8, 1, (ap_q_mode)0, (ap_o_mode)0, 0>, ap_fixed<16, 6, (ap_q_mode)5, (ap_o_mode)3, 0>, config5>' (firmware/nnet_utils/nnet_dense_resource.h:99->firmware/nnet_utils/nnet_dense_resource.h:274) automatically.
INFO: [XFORM 203-602] Inlining function 'nnet::cast<ap_fixed<8, 1, (ap_q_mode)0, (ap_o_mode)0, 0>, ap_fixed<16, 6, (ap_q_mode)5, (ap_o_mode)3, 0>, config8>' into 'nnet::dense_resource<ap_fixed<8, 1, (ap_q_mode)0, (ap_o_mode)0, 0>, ap_fixed<16, 6, (ap_q_mode)5, (ap_o_mode)3, 0>, config8>' (firmware/nnet_utils/nnet_dense_resource.h:99->firmware/nnet_utils/nnet_dense_resource.h:274) automatically.
INFO: [HLS 200-111] Finished Checking Synthesizability Time (s): cpu = 02:23:31 ; elapsed = 02:32:59 . Memory (MB): peak = 3074.469 ; gain = 2667.051 ; free physical = 1778 ; free virtual = 9340
INFO: [XFORM 203-502] Unrolling all loops for pipelining in function 'nnet::relu<ap_fixed<16, 6, (ap_q_mode)5, (ap_o_mode)3, 0>, ap_fixed<8, 1, (ap_q_mode)0, (ap_o_mode)0, 0>, relu_config10>' (firmware/nnet_utils/nnet_activation.h:71:26).
INFO: [XFORM 203-502] Unrolling all sub-loops inside loop 'ReuseLoop' (firmware/nnet_utils/nnet_dense_resource.h:64) in function 'nnet::dense_resource<ap_fixed<8, 1, (ap_q_mode)0, (ap_o_mode)0, 0>, ap_fixed<16, 6, (ap_q_mode)5, (ap_o_mode)3, 0>, config8>' for pipelining.
INFO: [XFORM 203-502] Unrolling all loops for pipelining in function 'nnet::relu<ap_fixed<16, 6, (ap_q_mode)5, (ap_o_mode)3, 0>, ap_fixed<8, 1, (ap_q_mode)0, (ap_o_mode)0, 0>, relu_config7>' (firmware/nnet_utils/nnet_activation.h:71:26).
INFO: [XFORM 203-502] Unrolling all sub-loops inside loop 'ReuseLoop' (firmware/nnet_utils/nnet_dense_resource.h:64) in function 'nnet::dense_resource<ap_fixed<8, 1, (ap_q_mode)0, (ap_o_mode)0, 0>, ap_fixed<16, 6, (ap_q_mode)5, (ap_o_mode)3, 0>, config5>' for pipelining.
INFO: [XFORM 203-502] Unrolling all loops for pipelining in function 'nnet::relu<ap_fixed<16, 6, (ap_q_mode)5, (ap_o_mode)3, 0>, ap_fixed<8, 1, (ap_q_mode)0, (ap_o_mode)0, 0>, relu_config4>' (firmware/nnet_utils/nnet_activation.h:71:26).
INFO: [XFORM 203-502] Unrolling all sub-loops inside loop 'ReuseLoop' (firmware/nnet_utils/nnet_dense_resource.h:64) in function 'nnet::dense_resource<ap_fixed<16, 6, (ap_q_mode)5, (ap_o_mode)3, 0>, ap_fixed<16, 6, (ap_q_mode)5, (ap_o_mode)3, 0>, config2>' for pipelining.
INFO: [HLS 200-489] Unrolling loop 'Loop-1' (firmware/nnet_utils/nnet_activation.h:76) in function 'nnet::relu<ap_fixed<16, 6, (ap_q_mode)5, (ap_o_mode)3, 0>, ap_fixed<8, 1, (ap_q_mode)0, (ap_o_mode)0, 0>, relu_config10>' completely with a factor of 10.
INFO: [HLS 200-489] Unrolling loop 'InitAccum' (firmware/nnet_utils/nnet_dense_resource.h:58) in function 'nnet::dense_resource<ap_fixed<8, 1, (ap_q_mode)0, (ap_o_mode)0, 0>, ap_fixed<16, 6, (ap_q_mode)5, (ap_o_mode)3, 0>, config8>' completely with a factor of 10.
INFO: [HLS 200-489] Unrolling loop 'MultLoop' (firmware/nnet_utils/nnet_dense_resource.h:73) in function 'nnet::dense_resource<ap_fixed<8, 1, (ap_q_mode)0, (ap_o_mode)0, 0>, ap_fixed<16, 6, (ap_q_mode)5, (ap_o_mode)3, 0>, config8>' completely with a factor of 500.
INFO: [HLS 200-489] Unrolling loop 'Result' (firmware/nnet_utils/nnet_dense_resource.h:97) in function 'nnet::dense_resource<ap_fixed<8, 1, (ap_q_mode)0, (ap_o_mode)0, 0>, ap_fixed<16, 6, (ap_q_mode)5, (ap_o_mode)3, 0>, config8>' completely with a factor of 10.
INFO: [HLS 200-489] Unrolling loop 'Loop-1' (firmware/nnet_utils/nnet_activation.h:76) in function 'nnet::relu<ap_fixed<16, 6, (ap_q_mode)5, (ap_o_mode)3, 0>, ap_fixed<8, 1, (ap_q_mode)0, (ap_o_mode)0, 0>, relu_config7>' completely with a factor of 100.
INFO: [HLS 200-489] Unrolling loop 'InitAccum' (firmware/nnet_utils/nnet_dense_resource.h:58) in function 'nnet::dense_resource<ap_fixed<8, 1, (ap_q_mode)0, (ap_o_mode)0, 0>, ap_fixed<16, 6, (ap_q_mode)5, (ap_o_mode)3, 0>, config5>' completely with a factor of 100.
INFO: [HLS 200-489] Unrolling loop 'MultLoop' (firmware/nnet_utils/nnet_dense_resource.h:73) in function 'nnet::dense_resource<ap_fixed<8, 1, (ap_q_mode)0, (ap_o_mode)0, 0>, ap_fixed<16, 6, (ap_q_mode)5, (ap_o_mode)3, 0>, config5>' completely with a factor of 25600.
ERROR: [XFORM 203-504] Stop unrolling loop 'MultLoop' (firmware/nnet_utils/nnet_dense_resource.h:73) in function 'nnet::dense_resource<ap_fixed<8, 1, (ap_q_mode)0, (ap_o_mode)0, 0>, ap_fixed<16, 6, (ap_q_mode)5, (ap_o_mode)3, 0>, config5>' because it may cause large runtime and excessive memory usage due to increase in code size. Please avoid unrolling the loop or form sub-functions for code in the loop body.
ERROR: [HLS 200-70] Pre-synthesis failed.
command 'ap_source' returned error code
    while executing
"source build_prj.tcl"
    ("uplevel" body line 1)
    invoked from within
"uplevel \#0 [list source $arg] "

INFO: [Common 17-206] Exiting vivado_hls at Sat Aug  7 20:51:15 2021...
Synthesis report not found.
Found 1 solution(s) in model_3/hls4ml_prj/myproject_prj.
Reports for solution "solution1":

C simulation report not found.
Synthesis report not found.
Co-simulation report not found.

You can see the error at the end but didn't find anything related on the issues here. Any ideas?

Also, with the normal model (without qkeras) i managed to get 93% accuracy and then after hls4ml with hls_model.predict the accuracy was only 35%. If i use qkeras for the model, the accuracy already is down to 70% and hls to 27%. Is there a way to improove that %?

If needed i can provide the full script/dataset.

HenningCode commented 2 years ago

I got the same Issues. Even with Models from the example folder.

HenningCode commented 2 years ago

Everytime I encountered the unrolling issues I just bumped up the Reuse to the maximum. Than it worked with normal FC networks.