fastmachinelearning / hls4ml

Machine learning on FPGAs using HLS
https://fastmachinelearning.org/hls4ml
Apache License 2.0
1.23k stars 397 forks source link

Vitis HLS backend #629

Closed vloncar closed 1 year ago

vloncar commented 2 years ago

Description

An initial attempt at supporting Vitis HLS. Introduces a new backend (Vitis) that takes most of the stuff over from Vivado backend and changes things a little. I tried to bring the two backends closer together, so the changes that Vitis requires but are begnin in Vivado backend are ported to Vidado backend. The more extensive ones are implemented as overrides of the existing HLS routines (currently only the streaming implementations needed this). The biggest change in Vitis HLS is the more strict use of pragmas, so some workarounds we did before are no longer possible, and require changes. For example, setting stream depth in a loop is no longer possible (loop variable is not constant, so pragma refuses to use it), which means encoded CNN implementation doesn't work and is removed from Vitis backend. There are also big issues with Resource strategy in general. Regardless of which of the three implementations I try to use, the compiler will reintroduce the modulo operator and synthesize an urem core which adds to the latency significantly. I'll try to resolve this later unless someone volunteers (yeah, unlikely).

Some plots comparing the latency and resource usage between the two will follow.

Type of change

Tests

TODO. We will also need to extend the synthesis tests to include the Vitis HLS (probably meaning we either fully switch to CERN machines, or we split over multiple machines at Fermilab.

Test Configuration:

Checklist

jmitrevs commented 2 years ago

Thanks for this PR. It will be very useful for me. When I tried synthesizing the code that has many 1D CNNs with Vitis, I get the following error.

ERROR: [HLS 214-272] In function 'void nnet::kernel_shift_1d<nnet::array<ap_fixed<16, 6, (ap_q_mode)5, (ap_o_mode)3, 0>, 32u>, config8>(nnet::array<ap_fixed<16, 6, (ap_q_mode)5, (ap_o_mode)3, 0>, 32u> const&, nnet::array<ap_fixed<16, 6, (ap_q_mode)5, (ap_o_mode)3, 0>, 32u>::value_type*)', Pragma conflict happens on 'INLINE' and 'PIPELINE' pragmas: same function (firmware/nnet_utils/nnet_conv_stream.h:161:0)

I will try commenting out the pipeline pragma, but this is shared with Vivado.

jmitrevs commented 2 years ago

Some warnings I get:

WARNING: [HLS 207-5551] unexpected pragma argument 'instances', expects function/operation (firmware/nnet_utils/nnet_conv1d_latency.h:67:24)
WARNING: [HLS 207-5551] unexpected pragma argument 'instances', expects function/operation (firmware/nnet_utils/nnet_conv1d_latency.h:145:24)
WARNING: [HLS 207-5556] Only for/while/do support the pipeline  pragma (firmware/nnet_utils/nnet_dense_stream.h:21:9)
WARNING: [HLS 214-113] Either use an argument of the function or declare the variable inside the dataflow loop body (firmware/myproject.cpp:57:73)
Resolution: For help on HLS 214-113 see www.xilinx.com/cgi-bin/docs/rdoc?v=2021.2;t=hls+guidance;d=214-113.html
WARNING: [HLS 214-113] Either use an argument of the function or declare the variable inside the dataflow loop body (firmware/myproject.cpp:57:77)
Resolution: For help on HLS 214-113 see www.xilinx.com/cgi-bin/docs/rdoc?v=2021.2;t=hls+guidance;d=214-113.html
WARNING: [HLS 214-113] Either use an argument of the function or declare the variable inside the dataflow loop body (firmware/myproject.cpp:69:72)

Sadly, I still seem to have the line buffer 1d CNN implementation seem to take very long to synthesize (which is why I was using encoded with Vivado_HLS).

vloncar commented 2 years ago

I fixed the warnings. The current CNN implementations for io_parallel are going to be replaced with #600 so I did't test them thoroughly. The warning in dense_stream was due to me forgetting to commit the change :man_facepalming:

vloncar commented 2 years ago

I would say this is ready for review/merge. Far from complete as it is missing the resource strategy and some warnings about the interfaces still need tweaks, but it mostly works.

I tested this on a range of individual layers and toy models. I'm attaching the script used to build the models and run synthesis (build.py), the script to compare the results and produce the tables (compare.py) and the shell scripts to automate the test. To run, one can just do:

./build_vivado.sh && ./build_vitis.sh && ./compare.sh

(note that you'll get your results faster if you manually run the build steps in parallel :wink:)

I'm also including the actual results. Overall, I see slight differences, like the slightly lower usage of LUTs, a cycle spared here & there. Always in favor of Vitis.

The failing SimpleRNN test is a separate bug, that will be addressed in another PR.

vitis_test.zip

jmitrevs commented 2 years ago

Are there still issues with the resource implementation? I am testing this on a 1D CNN, and the compiler seems to fail when I use latency. This particular has a with of 11 with 32 channels, and 64 filters of width 9.

jmitrevs commented 2 years ago

Using the resource implementation, I have successfully used this backend for a 1D CNN Vitis workflow.

jmitrevs commented 2 years ago

The FINN people mentioned the default pipeline time changing. Are there any issues related to that?

vloncar commented 1 year ago

Since 94dbe80 there is support for resource strategy in Dense/Conv layers. The one corner case (when rf > n_in & rf % n_in != 0) still uses urem cores, but that is rare. What remains is to follow up this change through the new CNN implementation that uses modified version of matrix-vector multiplication based on resource strategy.

thesps commented 1 year ago

What about the interplay with VivadoAccelerator backend? The IPs generated from Vitis HLS should be perfectly compatible with those block designs, and it would be good to take advantage of that.

jmitrevs commented 1 year ago

I get the following synthesis error with the Vitis backend.

WARNING: [HLS 214-356] Array transformation on pointer indexing may lead to incorrect bank selection. Use array index instead of pointer. (firmware/nnet_utils/nnet_conv1d_resource.h:95:11)
ERROR: [HLS 214-384] in function 'void nnet::conv_1d_resource_cl<ap_fixed<16, 6, (ap_q_mode)5, (ap_o_mode)3, 0>, ap_fixed<16, 6, (ap_q_mode)5, (ap_o_mode)3, 0>, config2>(ap_fixed<16, 6, (ap_q_mode)5, (ap_o_mode)3, 0>*, ap_fixed<16, 6, (ap_q_mode)5, (ap_o_mode)3, 0>*, config2::weight_t*, config2::bias_t*) (.268)': Cannot apply array transformation pragma/directive because of full array load/store. (firmware/nnet_utils/nnet_conv1d_resource.h:95:11)
ERROR: [HLS 200-1715] Encountered problem during source synthesis
jmitrevs commented 1 year ago

When working with CNNs I get lots of warnings of the type:

WARNING: [HLS 214-113] Either use an argument of the function or declare the variable inside the dataflow loop body (cnn/firmware/vplane.cpp:57:73)
Resolution: For help on HLS 214-113 see www.xilinx.com/cgi-bin/docs/rdoc?v=2022.2;t=hls+guidance;d=214-113.html

inside a dataflow section. The warning seems to be because the weights and biases are global variables, not local variables. As a test, I put the weights inside the function body, right after the #pragma HLS DATAFLOW (by moving the includes to be there instead of in the header file), and the warning went away. How concerned should we be about this warning?

brkortma commented 1 year ago

Hello All, is this pull request still under review? I would be interested in using Vitis_HLS in combination with hls4ml and also willing to do some testing if required.

Please let me know!

Best, Bryan

vloncar commented 1 year ago

Yes, it is still under consideration for merging, hopefully we'll decide on Friday whether to include it in this release cycle (as an experimental feature) or postpone it to the next release. In fact, I'm expanding the test suite to include Vitis right now. Expect some commits later today that will rebase this to current main branch. Feel free to test it in any way you need it. We would be interested to get feedback on QoR differences between Vitis and Vivado, especially anything that works worse on Vitis. So far we're aware of issues with resource strategy and @jmitrevs had more success with relaxed clock uncertainty. The reduce operation (used by e.g., pooling layers) also uses more resources in some cases.

brkortma commented 1 year ago

Right, I would be quite keen on these changes once they become available. currently im setting up a test setup with this Vitis_port branch in this pull request. I can definitely provide some testing on the Vitis end. However our I currently don't have Vivado_HLS setup so comparisons between the two I cannot provide I think.

You mentioned later today some commits to be rebased in the current main branch. Which main branch is this(of which fork/repo)? Could you point me to it?

Best, Bryan

vloncar commented 1 year ago

This is the latest Vitis branch, I meant I will resolve current merge conflicts so it can be merged into the main branch of hls4ml and these commits will appear here.

brkortma commented 1 year ago

Ahh great news, thanks very much. I'll keep an eye on it then. If you need any specific testing feel free to let me know.

brkortma commented 1 year ago

Hi, I was trying to build my model but its taking quite a long time. Do you perhaps have any small testing models that i can try to build, which you have verifier work with Vivado 2022.1? (vitis_hls)

Andre-coder commented 4 months ago

I'm trying to run a model on vitis_hls 2023.2 getting the following error after extracting successfully the model from hls4ml.

make: * [Makefile.rules:318: csim.exe] Error 1 ERROR: [SIM 211-100] 'csim_design' failed: compilation error(s). INFO: [SIM 211-3] ***** CSIM finish *** INFO: [HLS 200-111] Finished Command csim_design CPU user time: 6.72 seconds. CPU system time: 0.77 seconds. Elapsed time: 15.3 seconds; current allocated memory: 0.211 MB. 4 while executing "source build_prj.tcl" ("uplevel" body line 1) invoked from within "uplevel #0 [list source $tclfile]

I would be glad if i get any help on this matter. Thank you!