External weights vivado accelerator

GiuseppeDiGuglielmo commented 2 years ago

Extend the VivadoAccelerator backend and add programmable weights. An ideal setup is a Xilinx Zynq/ZynqMP board (ARM core + programmable logic).

The backend generates code for Vivado HLS, Vivado, and Vivado SDK:

the hls4ml IP with AXI master interfaces to move input features, output predictions, and weights between off-chip RAM and FPGA chip
a complete Vivado project that integrates the hls4ml IP (for the target board/chip)
a complete baremetal application to control and program the accelerator

Type of change

Some changes to the signature of the function convert_from_keras_model():

    hls_model =  hls4ml.converters.convert_from_keras_model(
            model=model,
            clock_period=CLOCK_PERIOD,
            backend='VivadoAccelerator',
            board=BOARD_NAME,
            part=FPGA_PART,
            io_type='io_stream',
            interface='axi_master',
            driver='c',
            input_data_tb=DATA_DIR+'/X_test.npy',
            output_data_tb=DATA_DIR+'/y_test.npy',
            hls_config=config,
            output_dir=OUTPUT_DIR)

and a new function write_header_file() to write a header file with an harcoded dataset:

hls4ml.writer.vivado_accelerator_writer.VivadoAcceleratorWriter.write_header_file(hls_model, X_test, y_test, y_qkeras, y_hls, 64, OUTPUT_DIR + '/sdk/common/data.h')

Tests

You can test it with the example at this repo: https://github.com/GiuseppeDiGuglielmo/test-hls4ml-backend

Right now, we support Ultra96v2, but more Zynq/ZynqMP boards can be added.

GiuseppeDiGuglielmo commented 2 years ago

@thesps thank you for all of the comments that help a lot!

I will reply both here and to the other inlined questions.

Can we decouple ‘repgrommable weights’ from ‘AXI Master interface’? I’d like to be able to do both: — AXI Master interface without reprogrammable weights — AXI Stream interface with reprogrammable weights (maybe only the NN in/out on AXI Stream interface while weights use AXI Master?)

We need the greater flexibility that you suggest. I agree.

Do we have a matrix or list of the existing possible combinations (what goes with what)?

We may also define how to pass this configuration information to the function convert_from_keras_model(). I am unsure if we should keep adding function parameters or if some nested struct/dictionary would be better. We may discuss this as a very first step because you have a better global and local view of the current status.

Each weight of the model will have its own top-level interface/port. How about a mode with a single 'weights' port with one type that can be configured (so could be some ap_fixed or float, double for example) with casting to the proper weights types in the HLS. This would be similar to what we allow now for inputs/outputs with VivadoAccelerator

We can fine-tune the weights to properly-sized ports. Not sure if that would significantly increase the logic. As you noticed, for simplicity, we bring in all the weights as float (32b) and convert them in the accelerator to the expected fixed precision.

Can we have a C driver that has some function taking the data as argument and returning the NN prediction? And some other function to reprogram the weights. We can call those from the standalone example that puts the data in header files, but they would be helpful for anyone integrating an hls4ml NN in a real application

Definitely. The current software application was mostly for proof-of-concept and debugging. We can break it down into a more organic library.

Can we have a Python driver for this? - I had some pynq Python for an AXI Master interface before, so I can help with this

Yes. Python would require an OS (or a lighter-weight RTOS). We are not booting an OS right now, so a C/C++ bare-metal application was sufficient as we had for the MLPerf Tiny submission.

I will comment a little more about this question.

Can we have the design for pynq-z2 as well? - I can help with this also

Definitely.

I have a few more boards in my working directory that I did not push yet. Essentially there are three classes of chips/boards:

"Pure FPGA" that for us are Microblaze-based systems (see MLPerf Tiny submission ort ARTY)
Zynq-based systems (like pynq-z2, pynq-z1, etc)
ZynqMP-based systems (like Ultra96v2, ZCU106, ZCU102, etc)

It might be easier to split these contributions into different PRs, roughly:

AXI Master interface C Driver Ultra96 support Reprogrammable weights

I like the breakdown.

Also, Xilinx SDK is a legacy tool, superseded by Vitis. Can we do it with Vitis instead?

This is a good question, and @jmitrevs brought it up as well.

I did not use Vitis seriously enough, and I assume it may come with a similar flow and equivalent tools (Vivado HLS -> Vivado -> Vivado SDK). Given Vitis, I am unsure how much we should push on the software side. Do you have any experience with the software stack in Vitis? In the longer term, I would also like to close the loop and create software applications/library for Linux. I started something based on Petalinux etc. but that once again may overlap what Vitis is doing.

Similarly, there is an ongoing effort to use the PYNQ software stack and overlays. I am wondering if we should look at that as well (I do not have experience with it, but I guess there should a good degree of compatibility).

GiuseppeDiGuglielmo commented 1 year ago

AXI Master interface

C Driver

Ultra96 support

Reprogrammable weights

Breaking down this draft in 4 PRs.

Let's start with PR653

fastmachinelearning / hls4ml

External weights vivado accelerator #646

Type of change

Tests