aws / aws-fpga

Official repository of the AWS EC2 FPGA Hardware and Software Development Kit
Other
1.51k stars 516 forks source link

Clock routing error [Place 30-838] #626

Closed ljp0101 closed 8 months ago

ljp0101 commented 1 year ago

My design has a placement problem after I added memory controllers to the CL in order to use DDR-A/B/D.

ERROR: [Place 30-838] The following clock nets need to use the same clock routing resource, as their clock buffer sources are locked to sites that use the same routing track. One or more loads of these clocks are locked to clock region(s) X1Y11 X2Y11 X3Y11 X4Y11 X4Y12 X4Y13 which causes the clock partitions for these clocks to overlap. This creates unresolvable contention on the clock routing resources. If the clock buffers need to be locked, we recommend users constrain them to a clock region and not to specific BUFGCE/BUFG_GT sites so they can use different routing resources. If clock sources should be locked to specific BUFGCE/BUFG_GT sites that share the same routing resources, make sure loads of such clocks are not constrained to the same region(s). Clock nets sharing routing resources: WRAPPER_INST/CL/SH_DDR/ddr_cores.DDR4_0/inst/u_ddr4_infrastructure/u_bufg_divClk_0 WRAPPER_INST/SH/kernel_clks_i/clkwiz_sys_clk/inst/CLK_CORE_DRP_I/clk_inst/clk_out2 ERROR: [Place 30-678] Failed to do clock region partitioning: failed to resolve clock partition contention for locked clock sources.

The information I found suggests this is caused by a Vivado bug, which was supposedly resolved in 2020.2. My design uses Vivado 2021.2 from the latest FPGA Developer AMI though. Could some aws-fpga IPs need to be rebuilt for it to be effective? https://repost.aws/questions/QUtcHFRAQRT4mC0bXSBtmCAA/the-clock-nets-need-to-use-the-same-clock-routing-resource https://support.xilinx.com/s/article/75539?language=en_US

I'm confused as to why WRAPPER_INST/SH/kernel_clks_i/clkwiz_sys_clk/inst/CLK_CORE_DRP_I/clk_inst/clk_out2 is involved. My CL only uses clk_main_a0, which I believe is WRAPPER_INST/SH/kernel_clks_i/clkwiz_sys_clk/inst/CLK_CORE_DRP_I/clk_inst/clk_out1, and further derivatives created with a MMCM within the CL.

I investigated the load on the various clocks and while I get nets after searching for WRAPPER_INST/SH/kernel_clks_i/clkwiz_sys_clk/inst/CLK_CORE_DRP_I/clk_inst/clk_out they say flat pins is 2 so I guess it's being used for something that I don't control and can't be optimised away.

I tried CLOCK_LOW_FANOUT in the cl_pnr_user.tcl. This didn't work for me. set_property CLOCK_LOW_FANOUT TRUE [get_nets [list WRAPPER_INST/SH/kernel_clks_i/clkwiz_sys_clk/inst/CLK_CORE_DRP_I/clk_inst/clk_out2]]

I switched the build strategy to DEFAULT from TIMING to disable post placement phys_opt and this works. However, I got much better results from TIMING before adding the memory controllers. I'll see if I can only disable phys_opt but I always saw significant gains during this step before I added the memory controllers so hoping to find a solution or better workaround.

czfpga commented 1 year ago

Hi @ljp0101,

This shouldn't be caused by an out-dated aws-fpga IP. This should be caused by an overly congested design around that reported region. Please refer to this AR for more info (https://support.xilinx.com/s/article/75539?language=en_US)

Have you tried any other strategies listed here: https://github.com/aws/aws-fpga/blob/863d963308231d0789a48f8840ceb1141368b34a/hdk/common/shell_v04261818/new_cl_template/build/README.md, especially the CONGESTION to see it make that help resolve the issue? Thanks.

Chen

ljp0101 commented 1 year ago

@czfpga I'm skeptical it's genuine clock routing congestion at this utilization. Don't have the report available but I did a compile with de minimus logic in the CL beyond the memory controllers and I still got the error so I don't think logic is being forced into particular regions.

+----------------------+--------+-------+--------------+------+-------+------------+-----------+-------+
|       Site Type      | Parent | Child | Non-Assigned | Used | Fixed | Prohibited | Available | Util% |
+----------------------+--------+-------+--------------+------+-------+------------+-----------+-------+
| GLOBAL CLOCK BUFFERs |      4 |     9 |            0 |   13 |     0 |          0 |      1200 |  1.08 |
|   BUFGCE             |      4 |     9 |            0 |   13 |     0 |          0 |       480 |  2.71 |
|   BUFGCE_DIV         |      0 |     0 |            0 |    0 |     0 |          0 |        80 |  0.00 |
|   BUFG_GT            |      0 |     0 |            0 |    0 |     0 |          0 |       480 |  0.00 |
|   BUFGCTRL*          |      0 |     0 |            0 |    0 |     0 |          0 |       160 |  0.00 |
| PLL                  |      0 |     9 |            0 |    9 |     0 |          0 |        40 | 22.50 |
| MMCM                 |      1 |     3 |            0 |    4 |     3 |          0 |        20 | 20.00 |
+----------------------+--------+-------+--------------+------+-------+------------+-----------+-------+

Removing phy_opt will solve the problem but I'm going to have to create a custom compile script if I do that because DEFAULT cut the number of processing cores I can pack on the device ~10%.

I'd really like to see if there's an alternative given this seems to be caused by an acknowledged and supposedly resolved Vivado bug. Note the AR says "this issue will be resolved for 2020.2 release of Vivado" and I'm using the latest Developer AMI with 2021.2.

czfpga commented 8 months ago

The table above isn't showing any routing resources. That's not what we indicated in previous comment. Please let us know if any strategy or less congested design helped resolve the issue.

ljp0101 commented 8 months ago

Cutting functionality and using one memory controller in the CL was my best option.

The clock routing report isn't saved by the compile script and the checkpoint files were already deleted so I couldn't easily provide it. I shared what I did because I'd be very surprised if there was a clock congestion problem with 13 global clocks and no local clocks in the CL (at least outside the memory controllers), especially given there's not this problem with the DEFAULT flow (but that gives me timing closure issues at the logic utilisation I can get with TIMING).

If you're asking about logic routing, I don't think it should matter but it was the three memory controllers plus another ~1% of slices for my minimal design.

I might want to revisit this later but I'll close as it's moot for now.