fastmachinelearning / hls4ml

Machine learning on FPGAs using HLS
https://fastmachinelearning.org/hls4ml
Apache License 2.0
1.18k stars 388 forks source link

Vivado synthesis report - zero BRAM utilisation (OOC) #798

Open bo3z opened 1 year ago

bo3z commented 1 year ago

Prerequisites

Please make sure to check off these prerequisites before submitting a bug report.

Quick summary

When using Resource strategy with Vivado backend, BRAM utilisation after full synthesis is zero. Inconsistency between Verilog and VHDL on the same designs. Four designs, all four of which use exactly the same number of other resources, have the same latency and implement the same architecture but have very different BRAM utilisation - this is not possible, as none of the memory was changed to LUT (other resources stay the same). This is a bug in reporting BRAM utilisation.

Details

Synthesize a one-layer model with 16 inputs and 16 outputs. The architecture-wide precision is set to 16 bits and biases are disabled. The reuse factor is set to 4. This is essentially a matrix-vector product between a 16x16 matrix and 16x1 vector implemented across 4 clock cycles. Set strategy to Resource, so that weights are stored in BRAM.

The BRAM can be estimated with the formula n_inputs n_outputs bit_width / (k reuse_factor), where k is the constant determining BRAM width. In this case, the BRAM is set to 36-bit width, as it is quite shallow, so the estimate for all the cases below should be (16 16 16) / (4 36) = 28.444.

There are four test cases, and the results don’t really align [attached below, synthesis reports]. In all the cases, I synthesised both the Verilog and VHDL that HLS generated, performed a full Vivado synthesis and design optimisation. The cases are:

Steps to Reproduce

To reproduce, the source code of nnet_dense_resource.h and vivado_synth.tcl needs to be modified. To match the cases above:

  1. Clone the hls4ml repository
  2. Checkout the master branch, with commit hash: 2e71ff4
  3. Run conversion on model file with code [see below]:
    
    input_shape = (16, )
    output_shape = 16

keras_model = Sequential() keras_model.add(Dense(output_shape, input_shape=input_shape, name='dense')) keras_model.compile()

weights = np.arange(np.prod(input_shape) output_shape).reshape(input_shape, output_shape) + 1 weights = weights / np.max(weights) keras_model.layers[0].set_weights([weights, np.zeros(output_shape)])

hls_config = hls4ml.utils.config_from_keras_model(keras_model, granularity='name', default_reuse_factor=4)
hls_config['Model']['Strategy'] = 'Resource'



### Expected behavior
All four designs should use the same resources, including BRAM.

### Actual behavior
All four designs have the same resources (some of the DSPs were re-implemented as LUTs in one design) except for BRAM. Two of the design use no BRAM even though logs say weights were implemented using BRAM. One design uses 1 BRAM and the last one uses 14.5, which is expected. This is likely a problem in the report / synthesis script. Since all of the IPs are equivalent, there should be no difference in BRAM.

### Possible fix
A possible solution is to do with the BRAM being OOC and Vivado not including it in the synthesis, see [here](https://support.xilinx.com/s/question/0D52E00006hpkGLSAY/bram-utilization-is-zero-in-utilization-report?language=en_US) and [here](https://support.xilinx.com/s/article/59282?language=en_US). Haven't managed yet to find a way to modify the synthesis scripts to include OOC files - some of the pragmas I looked into are `link_design` as well as synthesising the exported IP, rather than HDL files.

A short-term solution is re-inserting the pragma in nnet_dense_resource and changing the synthesis script to use Verilog. Not sure if this bug occurs for larger models. 

[verilog_no_pragma.txt](https://github.com/fastmachinelearning/hls4ml/files/11530514/verilog_no_pragma.txt)
[verilog_pragma.txt](https://github.com/fastmachinelearning/hls4ml/files/11530541/verilog_pragma.txt)
[vhdl_no_pragma.txt](https://github.com/fastmachinelearning/hls4ml/files/11530544/vhdl_no_pragma.txt)
[vhdl_pragma.txt](https://github.com/fastmachinelearning/hls4ml/files/11530545/vhdl_pragma.txt)