Open amithmath opened 1 month ago
It should definitely fit on vu47p. Perhaps the manycore config is set too large. Do you have a utilization report you can post?
Following is the report:
ERROR: [DRC UTLZ-1] Resource utilization: LUT as Logic over-utilized in Top Level Design (This design requires more LUT as Logic cells than are available in the target device. This design requires 71984 of such cell types but only 70560 compatible sites are available in the target device. Please analyze your synthesis results and constraints to ensure the design is mapped to Xilinx primitives as expected. If so, please consider targeting a larger device. Please set tcl parameter "drc.disableLUTOverUtilError" to 1 to change this error to warning.)
Can you post the actual reports and not just the error? Would need to see the hierarchical breakdown to see where LUTs are going
Utilization Design Information
CLB Logic 1.1 Summary of Registers by Type
BLOCKRAM
ARITHMETIC
I/O
CLOCK
ADVANCED
CONFIGURATION
Primitives
Black Boxes
Instantiated Netlists
+----------------------------+-------+-------+------------+-----------+--------+ | Site Type | Used | Fixed | Prohibited | Available | Util% | +----------------------------+-------+-------+------------+-----------+--------+ | CLB LUTs* | 81591 | 0 | 0 | 70560 | 115.63 | | LUT as Logic | 72479 | 0 | 0 | 70560 | 102.72 | | LUT as Memory | 9112 | 0 | 0 | 28800 | 31.64 | | LUT as Distributed RAM | 8933 | 0 | | | | | LUT as Shift Register | 179 | 0 | | | | | CLB Registers | 26938 | 0 | 0 | 141120 | 19.09 | | Register as Flip Flop | 26669 | 0 | 0 | 141120 | 18.90 | | Register as Latch | 269 | 0 | 0 | 141120 | 0.19 | | CARRY8 | 1059 | 0 | 0 | 8820 | 12.01 | | F7 Muxes | 591 | 0 | 0 | 35280 | 1.68 | | F8 Muxes | 78 | 0 | 0 | 17640 | 0.44 | | F9 Muxes | 0 | 0 | 0 | 8820 | 0.00 | +----------------------------+-------+-------+------------+-----------+--------+
+-------+--------------+-------------+--------------+ | Total | Clock Enable | Synchronous | Asynchronous | +-------+--------------+-------------+--------------+ | 0 | | - | - | | 0 | | - | Set | | 0 | | - | Reset | | 0 | | Set | - | | 0 | _ | Reset | - | | 0 | Yes | - | - | | 0 | Yes | - | Set | | 395 | Yes | - | Reset | | 1140 | Yes | Set | - | | 25403 | Yes | Reset | - | +-------+--------------+-------------+--------------+
+-------------------+------+-------+------------+-----------+-------+ | Site Type | Used | Fixed | Prohibited | Available | Util% | +-------------------+------+-------+------------+-----------+-------+ | Block RAM Tile | 40.5 | 0 | 0 | 216 | 18.75 | | RAMB36/FIFO* | 38 | 0 | 0 | 216 | 17.59 | | RAMB36E2 only | 38 | | | | | | RAMB18 | 5 | 0 | 0 | 432 | 1.16 | | RAMB18E2 only | 5 | | | | | +-------------------+------+-------+------------+-----------+-------+
+----------------+------+-------+------------+-----------+-------+ | Site Type | Used | Fixed | Prohibited | Available | Util% | +----------------+------+-------+------------+-----------+-------+ | DSPs | 19 | 0 | 0 | 360 | 5.28 | | DSP48E2 only | 19 | | | | | +----------------+------+-------+------------+-----------+-------+
+------------+------+-------+------------+-----------+-------+ | Site Type | Used | Fixed | Prohibited | Available | Util% | +------------+------+-------+------------+-----------+-------+ | Bonded IOB | 0 | 0 | 0 | 82 | 0.00 | +------------+------+-------+------------+-----------+-------+
+----------------------+------+-------+------------+-----------+-------+ | Site Type | Used | Fixed | Prohibited | Available | Util% | +----------------------+------+-------+------------+-----------+-------+ | GLOBAL CLOCK BUFFERs | 9 | 0 | 0 | 196 | 4.59 | | BUFGCE | 2 | 0 | 0 | 88 | 2.27 | | BUFGCE_DIV | 0 | 0 | 0 | 12 | 0.00 | | BUFG_PS | 1 | 0 | 0 | 72 | 1.39 | | BUFGCTRL* | 3 | 0 | 0 | 24 | 12.50 | | PLL | 0 | 0 | 0 | 6 | 0.00 | | MMCM | 1 | 0 | 0 | 3 | 33.33 | +----------------------+------+-------+------------+-----------+-------+
+-----------+------+-------+------------+-----------+--------+ | Site Type | Used | Fixed | Prohibited | Available | Util% | +-----------+------+-------+------------+-----------+--------+ | PS8 | 1 | 0 | 0 | 1 | 100.00 | | SYSMONE4 | 0 | 0 | 0 | 1 | 0.00 | +-----------+------+-------+------------+-----------+--------+
+-------------+------+-------+------------+-----------+-------+ | Site Type | Used | Fixed | Prohibited | Available | Util% | +-------------+------+-------+------------+-----------+-------+ | BSCANE2 | 0 | 0 | 0 | 4 | 0.00 | | DNA_PORTE2 | 0 | 0 | 0 | 1 | 0.00 | | EFUSE_USR | 0 | 0 | 0 | 1 | 0.00 | | FRAME_ECCE4 | 0 | 0 | 0 | 1 | 0.00 | | ICAPE3 | 0 | 0 | 0 | 2 | 0.00 | | MASTER_JTAG | 0 | 0 | 0 | 1 | 0.00 | | STARTUPE3 | 0 | 0 | 0 | 1 | 0.00 | +-------------+------+-------+------------+-----------+-------+
+------------+-------+---------------------+ | Ref Name | Used | Functional Category | +------------+-------+---------------------+ | LUT6 | 36979 | CLB | | FDRE | 25403 | Register | | LUT5 | 14887 | CLB | | RAMD32 | 13976 | CLB | | LUT4 | 13086 | CLB | | LUT3 | 10708 | CLB | | LUT2 | 7359 | CLB | | RAMS32 | 2078 | CLB | | LUT1 | 1278 | CLB | | FDSE | 1140 | Register | | CARRY8 | 1059 | CLB | | RAMD64E | 868 | CLB | | MUXF7 | 591 | CLB | | LDCE | 269 | Register | | FDCE | 126 | Register | | SRL16E | 111 | CLB | | MUXF8 | 78 | CLB | | SRLC32E | 68 | CLB | | RAMB36E2 | 38 | BLOCKRAM | | RAMS64E | 35 | CLB | | DSP48E2 | 19 | Arithmetic | | RAMB18E2 | 5 | BLOCKRAM | | BUFGCTRL | 3 | Clock | | BUFGCE | 2 | Clock | | PS8 | 1 | Advanced | | MMCME4_ADV | 1 | Clock | | BUFG_PS | 1 | Clock | +------------+-------+---------------------+
+----------+------+ | Ref Name | Used | +----------+------+
+----------+------+ | Ref Name | Used | +----------+------+
Above report is from ultra96v2. There is no vps_zynq_bd.vu47p.tcl in https://github.com/black-parrot-hdk/zynq-parrot/tree/master/cosim/tcl/bd
Yes, so it should fit on vu47p. There is no vps configuration file as the vu47p is not a zynq part (no PS)
For the vu47p, we use a uart bridge to emulate the PS. You can see the connection here: https://github.com/black-parrot-hdk/zynq-parrot/blob/master/cosim/xdc/board.vu47p.xdc and the cosimulation here: https://github.com/black-parrot-hdk/zynq-parrot/blob/master/cosim/include/bridge/bsg_zynq_pl.h but we haven’t open-sourced a hardware configuration as it’s a fairly custom solution.
For the ultra96v2 that report is indicating it is very close to fitting. Reducing sizes of the structures in the BlackParrot cores may get you there. Take a look at the TinyParrot configuration in the aviary and experiment with reducing branch predictors and caches
I am wondering, is there any possibility to port the hardware to Alveo U250 data center card (https://www.amd.com/en/products/accelerators/alveo/u250/a-u250-a64g-pq-g.html), if I can what are the changes to be done? By the way, bsg_manycore accelerator is 32 bit, can one change to 64 bit? If possible, what are the changes to be done?
These are both very substantial projects.
The U250 has a pynq port, so beginning there and working through the cosim examples is the way to start. Once cosim is working, hardware examples should port in a straightforward manner
There was a student who ported the manycore toolchain to 64b: https://github.com/bespoke-silicon-group/bsg_manycore/pull/720
The hardware would require more changes, but primarily in parameterization. The actual RV64I ISA difference is minimal, especially if only F support is needed
Both projects would require a highly motivated student for likely two+ quarters. Feel free to reach out to discuss funding for these efforts
Thanks let me see. I was running vcs simulation from /home/ynq-parrot/cosim/hammerblade-example/vcs, I am getting following errors:
"/home/zynq-parrot/cosim/v/bsg_zynq_pl_shell.sv", 405: bsg_nonsynth_zynq_testbench.dut.top_fpga_inst.zps.pl_to_ps[0].unnamed$$_0: started at 0ps failed at 0ps Offending '(((~S_AXI_ARESETN) | (~slv_rd_sel_one_hot[(num_regs_ps_to_pl_p + 0)])) | pl_to_ps_fifo_valid_lo[0])' Error: "/home/sonal/ViBram/zynq-parrot/cosim/v/bsg_zynq_pl_shell.sv", 405: bsg_nonsynth_zynq_testbench.dut.top_fpga_inst.zps.pl_to_ps[0].unnamed$$_0: at time 0 ps read from empty fifo "/home/zynq-parrot/cosim/v/bsg_zynq_pl_shell.sv", 405: bsg_nonsynth_zynq_testbench.dut.top_fpga_inst.zps.pl_to_ps[1].unnamed$$_0: started at 0ps failed at 0ps Offending '(((~S_AXI_ARESETN) | (~slv_rd_sel_one_hot[(num_regs_ps_to_pl_p + 1)])) | pl_to_ps_fifo_valid_lo[1])' Error: "/home/zynq-parrot/cosim/v/bsg_zynq_pl_shell.sv", 405: bsg_nonsynth_zynq_testbench.dut.top_fpga_inst.zps.pl_to_ps[1].unnamed$$_0: at time 0 ps read from empty fifo
Please help, I am getting these errors in VCS:
BSG ERROR (bsg_nonsynth_zynq_testbench.axil4.wready_gpio): final block executed before fini() was called Fatal: "/home/sonal/ViBram/zynq-parrot/import/basejump_stl/bsg_test/bsg_nonsynth_dpi_gpio.sv", 64: bsg_nonsynth_zynq_testbench.axil4.bresp_gpio: at time 56425001 ps BSG ERROR (bsg_nonsynth_zynq_testbench.axil4.bresp_gpio): final block executed before fini() was called Fatal: "/home/sonal/ViBram/zynq-parrot/import/basejump_stl/bsg_test/bsg_nonsynth_dpi_gpio.sv", 64: bsg_nonsynth_zynq_testbench.axil4.bvalid_gpio: at time 56425001 ps BSG ERROR (bsg_nonsynth_zynq_testbench.axil4.bvalid_gpio): final block executed before fini() was called Fatal: "/home/sonal/ViBram/zynq-parrot/import/basejump_stl/bsg_test/bsg_nonsynth_dpi_gpio.sv", 64: bsg_nonsynth_zynq_testbench.axil4.bready_gpio: at time 56425001 ps BSG ERROR (bsg_nonsynth_zynq_testbench.axil4.bready_gpio): final block executed before fini() was called Fatal: "/home/sonal/ViBram/zynq-parrot/import/basejump_stl/bsg_test/bsg_nonsynth_dpi_gpio.sv", 64: bsg_nonsynth_zynq_testbench.axil4.araddr_gpio: at time 56425001 ps BSG ERROR (bsg_nonsynth_zynq_testbench.axil4.araddr_gpio): final block executed before fini() was called Fatal: "/home/sonal/ViBram/zynq-parrot/import/basejump_stl/bsg_test/bsg_nonsynth_dpi_gpio.sv", 64: bsg_nonsynth_zynq_testbench.axil4.arprot_gpio: at time 56425001 ps BSG ERROR (bsg_nonsynth_zynq_testbench.axil4.arprot_gpio): final block executed before fini() was called Fatal: "/home/sonal/ViBram/zynq-parrot/import/basejump_stl/bsg_test/bsg_nonsynth_dpi_gpio.sv", 64: bsg_nonsynth_zynq_testbench.axil4.arvalid_gpio: at time 56425001 ps BSG ERROR (bsg_nonsynth_zynq_testbench.axil4.arvalid_gpio): final block executed before fini() was called Fatal: "/home/sonal/ViBram/zynq-parrot/import/basejump_stl/bsg_test/bsg_nonsynth_dpi_gpio.sv", 64: bsg_nonsynth_zynq_testbench.axil4.arready_gpio: at time 56425001 ps BSG ERROR (bsg_nonsynth_zynq_testbench.axil4.arready_gpio): final block executed before fini() was called Fatal: "/home/sonal/ViBram/zynq-parrot/import/basejump_stl/bsg_test/bsg_nonsynth_dpi_gpio.sv", 64: bsg_nonsynth_zynq_testbench.axil4.rdata_gpio: at time 56425001 ps BSG ERROR (bsg_nonsynth_zynq_testbench.axil4.rdata_gpio): final block executed before fini() was called Fatal: "/home/sonal/ViBram/zynq-parrot/import/basejump_stl/bsg_test/bsg_nonsynth_dpi_gpio.sv", 64: bsg_nonsynth_zynq_testbench.axil4.rresp_gpio: at time 56425001 ps BSG ERROR (bsg_nonsynth_zynq_testbench.axil4.rresp_gpio): final block executed before fini() was called Fatal: "/home/sonal/ViBram/zynq-parrot/import/basejump_stl/bsg_test/bsg_nonsynth_dpi_gpio.sv", 64: bsg_nonsynth_zynq_testbench.axil4.rvalid_gpio: at time 56425001 ps BSG ERROR (bsg_nonsynth_zynq_testbench.axil4.rvalid_gpio): final block executed before fini() was called Fatal: "/home/sonal/ViBram/zynq-parrot/import/basejump_stl/bsg_test/bsg_nonsynth_dpi_gpio.sv", 64: bsg_nonsynth_zynq_testbench.axil4.rready_gpio: at time 56425001 ps BSG ERROR (bsg_nonsynth_zynq_testbench.axil4.rready_gpio): final block executed before fini() was called SG ERROR (bsg_nonsynth_zynq_testbench.axil4.rready_gpio): final block executed before fini() was called V C S S i m u l a t i o n R e p o r t Time: 56425001 ps CPU Time: 1.510 seconds; Data structure size: 4.5Mb
Can you please point file and line number where I can experiment with aviary by reducing branch predictors and caches?
I am trying to implement hammerblade example in pynqz2,ultra96v2, and vu47p but it is running out of resources. This issue has been raised in https://github.com/black-parrot-hdk/zynq-parrot/issues/76 but even for ultra96v2 it is running out of resources. Please let me know which board to use. Thanks.