black-parrot-hdk / zynq-parrot

BlackParrot on Zynq
BSD 3-Clause "New" or "Revised" License
25 stars 14 forks source link

Hammerblade #101

Open amithmath opened 1 month ago

amithmath commented 1 month ago

I am trying to implement hammerblade example in pynqz2,ultra96v2, and vu47p but it is running out of resources. This issue has been raised in https://github.com/black-parrot-hdk/zynq-parrot/issues/76 but even for ultra96v2 it is running out of resources. Please let me know which board to use. Thanks.

dpetrisko commented 1 month ago

It should definitely fit on vu47p. Perhaps the manycore config is set too large. Do you have a utilization report you can post?

amithmath commented 1 month ago

Following is the report:

ERROR: [DRC UTLZ-1] Resource utilization: LUT as Logic over-utilized in Top Level Design (This design requires more LUT as Logic cells than are available in the target device. This design requires 71984 of such cell types but only 70560 compatible sites are available in the target device. Please analyze your synthesis results and constraints to ensure the design is mapped to Xilinx primitives as expected. If so, please consider targeting a larger device. Please set tcl parameter "drc.disableLUTOverUtilError" to 1 to change this error to warning.)

dpetrisko commented 1 month ago

Can you post the actual reports and not just the error? Would need to see the hierarchical breakdown to see where LUTs are going

amithmath commented 1 month ago

Copyright 1986-2022 Xilinx, Inc. All Rights Reserved.

| Tool Version : Vivado v.2022.1 (lin64) Build 3526262 Mon Apr 18 15:47:01 MDT 2022 | Date : Sat Sep 28 20:11:13 2024 | Host : amd64 running 64-bit CentOS Linux release 7.9.2009 (Core) | Command : report_utilization -file hammerblade_bd_1_wrapper_utilization_synth.rpt -pb hammerblade_bd_1_wrapper_utilization_synth.pb | Design : hammerblade_bd_1_wrapper | Device : xczu3eg-sbva484-1-e | Speed File : -1 | Design State : Synthesized

Utilization Design Information

Table of Contents

  1. CLB Logic 1.1 Summary of Registers by Type

  2. BLOCKRAM

  3. ARITHMETIC

  4. I/O

  5. CLOCK

  6. ADVANCED

  7. CONFIGURATION

  8. Primitives

  9. Black Boxes

  10. Instantiated Netlists

  11. CLB Logic

+----------------------------+-------+-------+------------+-----------+--------+ | Site Type | Used | Fixed | Prohibited | Available | Util% | +----------------------------+-------+-------+------------+-----------+--------+ | CLB LUTs* | 81591 | 0 | 0 | 70560 | 115.63 | | LUT as Logic | 72479 | 0 | 0 | 70560 | 102.72 | | LUT as Memory | 9112 | 0 | 0 | 28800 | 31.64 | | LUT as Distributed RAM | 8933 | 0 | | | | | LUT as Shift Register | 179 | 0 | | | | | CLB Registers | 26938 | 0 | 0 | 141120 | 19.09 | | Register as Flip Flop | 26669 | 0 | 0 | 141120 | 18.90 | | Register as Latch | 269 | 0 | 0 | 141120 | 0.19 | | CARRY8 | 1059 | 0 | 0 | 8820 | 12.01 | | F7 Muxes | 591 | 0 | 0 | 35280 | 1.68 | | F8 Muxes | 78 | 0 | 0 | 17640 | 0.44 | | F9 Muxes | 0 | 0 | 0 | 8820 | 0.00 | +----------------------------+-------+-------+------------+-----------+--------+

1.1 Summary of Registers by Type

+-------+--------------+-------------+--------------+ | Total | Clock Enable | Synchronous | Asynchronous | +-------+--------------+-------------+--------------+ | 0 | | - | - | | 0 | | - | Set | | 0 | | - | Reset | | 0 | | Set | - | | 0 | _ | Reset | - | | 0 | Yes | - | - | | 0 | Yes | - | Set | | 395 | Yes | - | Reset | | 1140 | Yes | Set | - | | 25403 | Yes | Reset | - | +-------+--------------+-------------+--------------+

  1. BLOCKRAM

+-------------------+------+-------+------------+-----------+-------+ | Site Type | Used | Fixed | Prohibited | Available | Util% | +-------------------+------+-------+------------+-----------+-------+ | Block RAM Tile | 40.5 | 0 | 0 | 216 | 18.75 | | RAMB36/FIFO* | 38 | 0 | 0 | 216 | 17.59 | | RAMB36E2 only | 38 | | | | | | RAMB18 | 5 | 0 | 0 | 432 | 1.16 | | RAMB18E2 only | 5 | | | | | +-------------------+------+-------+------------+-----------+-------+

  1. ARITHMETIC

+----------------+------+-------+------------+-----------+-------+ | Site Type | Used | Fixed | Prohibited | Available | Util% | +----------------+------+-------+------------+-----------+-------+ | DSPs | 19 | 0 | 0 | 360 | 5.28 | | DSP48E2 only | 19 | | | | | +----------------+------+-------+------------+-----------+-------+

  1. I/O

+------------+------+-------+------------+-----------+-------+ | Site Type | Used | Fixed | Prohibited | Available | Util% | +------------+------+-------+------------+-----------+-------+ | Bonded IOB | 0 | 0 | 0 | 82 | 0.00 | +------------+------+-------+------------+-----------+-------+

  1. CLOCK

+----------------------+------+-------+------------+-----------+-------+ | Site Type | Used | Fixed | Prohibited | Available | Util% | +----------------------+------+-------+------------+-----------+-------+ | GLOBAL CLOCK BUFFERs | 9 | 0 | 0 | 196 | 4.59 | | BUFGCE | 2 | 0 | 0 | 88 | 2.27 | | BUFGCE_DIV | 0 | 0 | 0 | 12 | 0.00 | | BUFG_PS | 1 | 0 | 0 | 72 | 1.39 | | BUFGCTRL* | 3 | 0 | 0 | 24 | 12.50 | | PLL | 0 | 0 | 0 | 6 | 0.00 | | MMCM | 1 | 0 | 0 | 3 | 33.33 | +----------------------+------+-------+------------+-----------+-------+

  1. ADVANCED

+-----------+------+-------+------------+-----------+--------+ | Site Type | Used | Fixed | Prohibited | Available | Util% | +-----------+------+-------+------------+-----------+--------+ | PS8 | 1 | 0 | 0 | 1 | 100.00 | | SYSMONE4 | 0 | 0 | 0 | 1 | 0.00 | +-----------+------+-------+------------+-----------+--------+

  1. CONFIGURATION

+-------------+------+-------+------------+-----------+-------+ | Site Type | Used | Fixed | Prohibited | Available | Util% | +-------------+------+-------+------------+-----------+-------+ | BSCANE2 | 0 | 0 | 0 | 4 | 0.00 | | DNA_PORTE2 | 0 | 0 | 0 | 1 | 0.00 | | EFUSE_USR | 0 | 0 | 0 | 1 | 0.00 | | FRAME_ECCE4 | 0 | 0 | 0 | 1 | 0.00 | | ICAPE3 | 0 | 0 | 0 | 2 | 0.00 | | MASTER_JTAG | 0 | 0 | 0 | 1 | 0.00 | | STARTUPE3 | 0 | 0 | 0 | 1 | 0.00 | +-------------+------+-------+------------+-----------+-------+

  1. Primitives

+------------+-------+---------------------+ | Ref Name | Used | Functional Category | +------------+-------+---------------------+ | LUT6 | 36979 | CLB | | FDRE | 25403 | Register | | LUT5 | 14887 | CLB | | RAMD32 | 13976 | CLB | | LUT4 | 13086 | CLB | | LUT3 | 10708 | CLB | | LUT2 | 7359 | CLB | | RAMS32 | 2078 | CLB | | LUT1 | 1278 | CLB | | FDSE | 1140 | Register | | CARRY8 | 1059 | CLB | | RAMD64E | 868 | CLB | | MUXF7 | 591 | CLB | | LDCE | 269 | Register | | FDCE | 126 | Register | | SRL16E | 111 | CLB | | MUXF8 | 78 | CLB | | SRLC32E | 68 | CLB | | RAMB36E2 | 38 | BLOCKRAM | | RAMS64E | 35 | CLB | | DSP48E2 | 19 | Arithmetic | | RAMB18E2 | 5 | BLOCKRAM | | BUFGCTRL | 3 | Clock | | BUFGCE | 2 | Clock | | PS8 | 1 | Advanced | | MMCME4_ADV | 1 | Clock | | BUFG_PS | 1 | Clock | +------------+-------+---------------------+

  1. Black Boxes

+----------+------+ | Ref Name | Used | +----------+------+

  1. Instantiated Netlists

+----------+------+ | Ref Name | Used | +----------+------+

amithmath commented 1 month ago

Above report is from ultra96v2. There is no vps_zynq_bd.vu47p.tcl in https://github.com/black-parrot-hdk/zynq-parrot/tree/master/cosim/tcl/bd

dpetrisko commented 1 month ago

Yes, so it should fit on vu47p. There is no vps configuration file as the vu47p is not a zynq part (no PS)

For the vu47p, we use a uart bridge to emulate the PS. You can see the connection here: https://github.com/black-parrot-hdk/zynq-parrot/blob/master/cosim/xdc/board.vu47p.xdc and the cosimulation here: https://github.com/black-parrot-hdk/zynq-parrot/blob/master/cosim/include/bridge/bsg_zynq_pl.h but we haven’t open-sourced a hardware configuration as it’s a fairly custom solution.

For the ultra96v2 that report is indicating it is very close to fitting. Reducing sizes of the structures in the BlackParrot cores may get you there. Take a look at the TinyParrot configuration in the aviary and experiment with reducing branch predictors and caches

amithmath commented 1 month ago

I am wondering, is there any possibility to port the hardware to Alveo U250 data center card (https://www.amd.com/en/products/accelerators/alveo/u250/a-u250-a64g-pq-g.html), if I can what are the changes to be done? By the way, bsg_manycore accelerator is 32 bit, can one change to 64 bit? If possible, what are the changes to be done?

dpetrisko commented 1 month ago

These are both very substantial projects.

The U250 has a pynq port, so beginning there and working through the cosim examples is the way to start. Once cosim is working, hardware examples should port in a straightforward manner

There was a student who ported the manycore toolchain to 64b: https://github.com/bespoke-silicon-group/bsg_manycore/pull/720

The hardware would require more changes, but primarily in parameterization. The actual RV64I ISA difference is minimal, especially if only F support is needed

Both projects would require a highly motivated student for likely two+ quarters. Feel free to reach out to discuss funding for these efforts

amithmath commented 1 month ago

Thanks let me see. I was running vcs simulation from /home/ynq-parrot/cosim/hammerblade-example/vcs, I am getting following errors:

"/home/zynq-parrot/cosim/v/bsg_zynq_pl_shell.sv", 405: bsg_nonsynth_zynq_testbench.dut.top_fpga_inst.zps.pl_to_ps[0].unnamed$$_0: started at 0ps failed at 0ps Offending '(((~S_AXI_ARESETN) | (~slv_rd_sel_one_hot[(num_regs_ps_to_pl_p + 0)])) | pl_to_ps_fifo_valid_lo[0])' Error: "/home/sonal/ViBram/zynq-parrot/cosim/v/bsg_zynq_pl_shell.sv", 405: bsg_nonsynth_zynq_testbench.dut.top_fpga_inst.zps.pl_to_ps[0].unnamed$$_0: at time 0 ps read from empty fifo "/home/zynq-parrot/cosim/v/bsg_zynq_pl_shell.sv", 405: bsg_nonsynth_zynq_testbench.dut.top_fpga_inst.zps.pl_to_ps[1].unnamed$$_0: started at 0ps failed at 0ps Offending '(((~S_AXI_ARESETN) | (~slv_rd_sel_one_hot[(num_regs_ps_to_pl_p + 1)])) | pl_to_ps_fifo_valid_lo[1])' Error: "/home/zynq-parrot/cosim/v/bsg_zynq_pl_shell.sv", 405: bsg_nonsynth_zynq_testbench.dut.top_fpga_inst.zps.pl_to_ps[1].unnamed$$_0: at time 0 ps read from empty fifo

bsg_tag_master transitioning to error state; be sure to run gate-level netlist to avoid sim/synth mismatch (bsg_nonsynth_zynq_testbench.dut.top_fpga_inst.master)

amithmath commented 1 month ago

Please help, I am getting these errors in VCS:

BSG ERROR (bsg_nonsynth_zynq_testbench.axil4.wready_gpio): final block executed before fini() was called Fatal: "/home/sonal/ViBram/zynq-parrot/import/basejump_stl/bsg_test/bsg_nonsynth_dpi_gpio.sv", 64: bsg_nonsynth_zynq_testbench.axil4.bresp_gpio: at time 56425001 ps BSG ERROR (bsg_nonsynth_zynq_testbench.axil4.bresp_gpio): final block executed before fini() was called Fatal: "/home/sonal/ViBram/zynq-parrot/import/basejump_stl/bsg_test/bsg_nonsynth_dpi_gpio.sv", 64: bsg_nonsynth_zynq_testbench.axil4.bvalid_gpio: at time 56425001 ps BSG ERROR (bsg_nonsynth_zynq_testbench.axil4.bvalid_gpio): final block executed before fini() was called Fatal: "/home/sonal/ViBram/zynq-parrot/import/basejump_stl/bsg_test/bsg_nonsynth_dpi_gpio.sv", 64: bsg_nonsynth_zynq_testbench.axil4.bready_gpio: at time 56425001 ps BSG ERROR (bsg_nonsynth_zynq_testbench.axil4.bready_gpio): final block executed before fini() was called Fatal: "/home/sonal/ViBram/zynq-parrot/import/basejump_stl/bsg_test/bsg_nonsynth_dpi_gpio.sv", 64: bsg_nonsynth_zynq_testbench.axil4.araddr_gpio: at time 56425001 ps BSG ERROR (bsg_nonsynth_zynq_testbench.axil4.araddr_gpio): final block executed before fini() was called Fatal: "/home/sonal/ViBram/zynq-parrot/import/basejump_stl/bsg_test/bsg_nonsynth_dpi_gpio.sv", 64: bsg_nonsynth_zynq_testbench.axil4.arprot_gpio: at time 56425001 ps BSG ERROR (bsg_nonsynth_zynq_testbench.axil4.arprot_gpio): final block executed before fini() was called Fatal: "/home/sonal/ViBram/zynq-parrot/import/basejump_stl/bsg_test/bsg_nonsynth_dpi_gpio.sv", 64: bsg_nonsynth_zynq_testbench.axil4.arvalid_gpio: at time 56425001 ps BSG ERROR (bsg_nonsynth_zynq_testbench.axil4.arvalid_gpio): final block executed before fini() was called Fatal: "/home/sonal/ViBram/zynq-parrot/import/basejump_stl/bsg_test/bsg_nonsynth_dpi_gpio.sv", 64: bsg_nonsynth_zynq_testbench.axil4.arready_gpio: at time 56425001 ps BSG ERROR (bsg_nonsynth_zynq_testbench.axil4.arready_gpio): final block executed before fini() was called Fatal: "/home/sonal/ViBram/zynq-parrot/import/basejump_stl/bsg_test/bsg_nonsynth_dpi_gpio.sv", 64: bsg_nonsynth_zynq_testbench.axil4.rdata_gpio: at time 56425001 ps BSG ERROR (bsg_nonsynth_zynq_testbench.axil4.rdata_gpio): final block executed before fini() was called Fatal: "/home/sonal/ViBram/zynq-parrot/import/basejump_stl/bsg_test/bsg_nonsynth_dpi_gpio.sv", 64: bsg_nonsynth_zynq_testbench.axil4.rresp_gpio: at time 56425001 ps BSG ERROR (bsg_nonsynth_zynq_testbench.axil4.rresp_gpio): final block executed before fini() was called Fatal: "/home/sonal/ViBram/zynq-parrot/import/basejump_stl/bsg_test/bsg_nonsynth_dpi_gpio.sv", 64: bsg_nonsynth_zynq_testbench.axil4.rvalid_gpio: at time 56425001 ps BSG ERROR (bsg_nonsynth_zynq_testbench.axil4.rvalid_gpio): final block executed before fini() was called Fatal: "/home/sonal/ViBram/zynq-parrot/import/basejump_stl/bsg_test/bsg_nonsynth_dpi_gpio.sv", 64: bsg_nonsynth_zynq_testbench.axil4.rready_gpio: at time 56425001 ps BSG ERROR (bsg_nonsynth_zynq_testbench.axil4.rready_gpio): final block executed before fini() was called SG ERROR (bsg_nonsynth_zynq_testbench.axil4.rready_gpio): final block executed before fini() was called V C S S i m u l a t i o n R e p o r t Time: 56425001 ps CPU Time: 1.510 seconds; Data structure size: 4.5Mb

amithmath commented 4 weeks ago

Can you please point file and line number where I can experiment with aviary by reducing branch predictors and caches?