Xilinx / finn

Dataflow compiler for QNN inference on FPGAs
https://xilinx.github.io/finn
BSD 3-Clause "New" or "Revised" License
747 stars 238 forks source link

Timing setting not propagated to final PYNQ project for Ultra96 in 0.4b release #241

Closed heborras closed 3 years ago

heborras commented 4 years ago

Hi, just now I noticed that it seems like the period_ns setting for the ZynqBuild transformation is not properly propagated into the final Vivado PYNQ project. However this seems to be only the case for the test_pynq_board = "Ultra96".

When running the cnv_end2end_example jupyter notebook and setting the target_clk_ns = 20, then this seems to propagate into the StreamingDataflowPartition, but not into the final project found in the folder referenced by vivado_pynq_proj.

As far as I can tell this is visible by looking at the build artifacts after the ZynqBuild transformation. Running the following code:

model = ModelWrapper(build_dir+"/end2end_cnv_w1a1_synth.onnx")

for node in model.graph.node:
    assert node.op_type == "StreamingDataflowPartition", "Invalid link graph"
    sdp_node = getCustomOp(node)
    dataflow_model_filename = sdp_node.get_nodeattr("model")
    kernel_model = ModelWrapper(dataflow_model_filename)
    clk_ns = float(kernel_model.get_metadata_prop("clk_ns"))
    print("dataflow folder: ", dataflow_model_filename, "; clk_ns: ", clk_ns)

vivado_pynq_proj = model.get_metadata_prop("vivado_pynq_proj")
print("vivado_pynq_proj folder: ", vivado_pynq_proj)

Gives me for both the Pynq-Z1 and the Ultra96:

dataflow folder:  /tmp/finn_dev_hendrik/dataflow_partition0_h_2a5n2w/df_model.onnx ; clk_ns:  20.0
dataflow folder:  /tmp/finn_dev_hendrik/dataflow_partition2_aj70gstl/df_model.onnx ; clk_ns:  20.0
dataflow folder:  /tmp/finn_dev_hendrik/dataflow_partition1_unykh1p3/df_model.onnx ; clk_ns:  20.0
vivado_pynq_proj folder:  /tmp/finn_dev_hendrik/vivado_zynq_proj_rco26znv

Which is the correct timing. Of course the paths vary between the two projects.

But then looking into the vivado project, found in the folder referenced by vivado_pynq_proj. I find that the timing reported in the timing summary report under /tmp/finn_dev_hendrik/vivado_zynq_proj_rco26znv/finn_zynq_link.runs/impl_1/top_wrapper_timing_summary_postroute_physopted.rpt Contains the following information for the Pynq-Z1:

------------------------------------------------------------------------------------------------
| Clock Summary
| -------------
------------------------------------------------------------------------------------------------

Clock       Waveform(ns)       Period(ns)      Frequency(MHz)
-----       ------------       ----------      --------------
clk_fpga_0  {0.000 10.000}     20.000          50.000         

But when the ZynqBuild is run for the Ultra96, this file contains the following:

------------------------------------------------------------------------------------------------
| Clock Summary
| -------------
------------------------------------------------------------------------------------------------

Clock     Waveform(ns)       Period(ns)      Frequency(MHz)
-----     ------------       ----------      --------------
clk_pl_0  {0.000 5.000}      10.000          100.000    

Which is strangely not the correct timing. The whole thing is made stranger by the fact that the ip_config.tcl seems to contain the correct frequency information: set FREQ_MHZ 50.

What I am seeing in the Vivado messages is the following additional warning for the Ultra96:

Vivado CommandsGeneral Messages[Board 49-67] The board_part definition was not found for em.avnet.com:ultra96v1:part0:1.2. This can happen sometimes when you use custom board part. You can resolve this issue by setting 'board.repoPaths' parameter, pointing to the location of custom board files. Valid board_part values can be retrieved with the 'get_board_parts' Tcl command.

However I am not sure if this is significant to the timing issue.

So I'm a bit at a loss as to where things go wrong between the two boards. Any Idea where I should look to further debug this issue?

maltanar commented 4 years ago

Hi Hendrik, thanks for posting this issue. It sounds like the Vivado project generation is not setting the clock correctly for the Ultra96.

Regarding the "board_part definition not found" error, that may actually be related to this. FINN will try to download the board files into finn/board_files when the container is launched, and inside the container the board.repoPaths will be set as /workspace/finn/board_files. Can you verify that this folder exists and it contains the board files? For me its contents look like this:

maltanar@finn_dev_maltanar:/workspace/finn$ ls /workspace/finn/board_files/
README.md      picozed_7010_som   pynq-z1            ultrazed_3eg_pciecc_es1
download_zip.png   picozed_7015_fmc2  pynq-z2            ultrazed_3eg_som
locate_zip.png     picozed_7015_som   ultra96v1          ultrazed_3eg_som_es1
microzed_7010      picozed_7020_fmc2  ultra96v2          ultrazed_7ev_cc
microzed_7020      picozed_7020_som   ultrazed_3eg_iocc      ultrazed_7ev_cc_es2
minized        picozed_7030_fmc2  ultrazed_3eg_iocc_es1  ultrazed_7ev_som
picozed_7010_fmc2  picozed_7030_som   ultrazed_3eg_pciecc    ultrazed_7ev_som_es2

You could try removing that folder and re-launching the container to force these files to be re-downloaded.

Otherwise, there may be an error in the Vivado project creation script ip_config.tcl, which is generated from this template here: https://github.com/Xilinx/finn/blob/master/src/finn/transformation/fpgadataflow/templates.py#L294

Specifically, the following lines set the clock frequency for Ultra96 (and all UltraScale+ parts):

https://github.com/Xilinx/finn/blob/master/src/finn/transformation/fpgadataflow/templates.py#L336 https://github.com/Xilinx/finn/blob/master/src/finn/transformation/fpgadataflow/templates.py#L358

You could try manually opening the generated Vivado project from the host computer and executing these lines of tcl to see if they adjust the clock correctly.

maltanar commented 3 years ago

I'm closing this issue due to inactivity. Please feel free to reopen, or discuss further on the FINN gitter channel on https://gitter.im/xilinx-finn/community