fastmachinelearning / hls4ml

Machine learning on FPGAs using HLS
https://fastmachinelearning.org/hls4ml
Apache License 2.0
1.24k stars 401 forks source link

Latency difference with same Model but version of hls4ml #889

Open sparajul opened 11 months ago

sparajul commented 11 months ago

Quick summary

I was trying to find the FPGA resource usage and the latency with the CNN model i build, I used exact same setting and got completely different result with 0.6.0 and 0.7.1 version of hls4ml. While using 0.6.0-> the latency was around 1.2 us and While using 0.7.1-> the latency was around 7us, which is a huge difference.

Steps to Reproduce

I worked in the jupyter notebook. If needed here is the complete notebook. https://github.com/sparajul/fastmachinelearning/blob/main/TrainCNN.ipynb

import hls4ml import os model_cnn = load_model('cnn.h5') os.environ['PATH'] = '/tools/Xilinx/Vivado/2018.3/bin:' + os.environ['PATH']

hls_config = hls4ml.utils.config_from_keras_model(model_cnn, granularity='name')

hls_config['Model']['Precision'] = 'ap_fixed<16,8>' hls_config['Model']['ReuseFactor'] = 10

cfg = hls4ml.converters.create_config(backend='Vivado') cfg['IOType'] = 'io_stream' cfg['HLSConfig'] = hls_config cfg['KerasModel'] = model_cnn cfg['OutputDir'] = 'keras_cnn/vu13p'

cfg['XilinxPart'] = 'xcvu13p-flga2577-2L-e'

hls_model_aq = hls4ml.converters.keras_to_hls(cfg) hls_model_aq.compile()

hls_model_aq.build(csim=False, synth=True, vsynth=True)

hls4ml.report.read_vivado_report('keras_cnn/vu13p')

Actual behavior

Difference Latency in different version of hls4ml

Saved model here

cnn.h5.zip

calad0i commented 11 months ago

It seems that you are using parallel io. In the newer version of hls4ml, conv unrolls are controlled by PrallelizationFactor in the config file (hls_config). (Dull unroll was done for latency strategy.) This value defaults to one, and you will need to set it to match the number your kernel is applied to get the whole convolution done in parallel.