YosysHQ / nextpnr

nextpnr portable FPGA place and route tool
ISC License
1.3k stars 242 forks source link

Main analytical placer does not (seem to) make progress after modest increase in logic #1062

Open arjenroodselaar opened 1 year ago

arjenroodselaar commented 1 year ago

I am working on a design using a LFE5U45-CABGA554 which is approaching ~60% of the available LUT resources. It seems like the logic is getting harder to place as small changes to the design may cause the main analytical phase of the HeAP placer to seemingly not make progress. It may complete one or two iterations, which on a successful run take about 5 seconds each, but more often than not run until I stop the process (I let it run overnight once without reaching a solution, for a total of almost 7 hours). With some patience I can find a seed value where the placer will run to completion and the design closes timing, but which seeds works changes each time I make changes to the design.

The device utilization is currently as follows:

Info: Device utilisation:
Info:             TRELLIS_IO:   142/  245    57%
Info:                   DCCA:     1/   56     1%
Info:                 DP16KD:     2/  108     1%
Info:             MULT18X18D:     0/   72     0%
Info:                 ALU54B:     0/   36     0%
Info:                EHXPLLL:     0/    4     0%
Info:                EXTREFB:     0/    2     0%
Info:                   DCUA:     0/    2     0%
Info:              PCSCLKDIV:     0/    2     0%
Info:                IOLOGIC:     0/  160     0%
Info:               SIOLOGIC:     0/   85     0%
Info:                    GSR:     0/    1     0%
Info:                  JTAGG:     0/    1     0%
Info:                   OSCG:     0/    1     0%
Info:                  SEDGA:     0/    1     0%
Info:                    DTR:     0/    1     0%
Info:                USRMCLK:     0/    1     0%
Info:                CLKDIVF:     0/    4     0%
Info:              ECLKSYNCB:     0/   10     0%
Info:                DLLDELD:     0/    8     0%
Info:                 DDRDLL:     0/    4     0%
Info:                DQSBUFM:     0/   10     0%
Info:        TRELLIS_ECLKBUF:     0/    8     0%
Info:           ECLKBRIDGECS:     0/    2     0%
Info:                   DCSC:     0/    2     0%
Info:             TRELLIS_FF: 14703/43848    33%
Info:           TRELLIS_COMB: 28016/43848    63%
Info:           TRELLIS_RAMW:     0/ 5481     0%

Before I go off and read both the HeAP paper and try and instrument the code to determine what happens, a couple of questions:

  1. Is this expected behavior when placing designs for ECP5 devices? I would expect to consistently start failing timing closure before a design reaches a size where it can't be placed, but this may be a misconception on my part.
  2. Is there any way for me to determine which parts of the design it is struggling to place so I can determine if changes to the design are possible to improve this?

Versions used:

Yosys 0.23+45 (git sha1 a64ed824e, aarch64-linux-gnu-gcc 9.3.0-17ubuntu1~20.04 -fPIC -Os)
nextpnr-ecp5 -- Next Generation Place and Route (Version nextpnr-0.4-47-gdb25c5c8)

I can provide the design if this behavior is not expected and someone wants to take a closer look.

gatecat commented 1 year ago

The problem is probably in finding a legal placement for carry chains, control sets etc. I am sure there are improvements to the heuristics etc here. The place you want to be looking at is legalise_placement_strict (which isn't really based on the HeAP paper as it's more based on the nextpnr philosophy of placing at a finer grain with more complex constraints).

It's not entirely unknown for this to get stuck on very high utilisation but 63% is concerning, so the design would be interesting too to get an idea what's going on (e.g. if there are particularly large numbers of long carry chains or unique control sets).

There's also a small but non-zero chance this isn't utilisation related at all but just a bug in the packer meaning that a placement is created with constraints that can never be satisified.

arjenroodselaar commented 1 year ago

The problem is probably in finding a legal placement for carry chains, control sets etc. I am sure there are improvements to the heuristics etc here. The place you want to be looking at is legalise_placement_strict (which isn't really based on the HeAP paper as it's more based on the nextpnr philosophy of placing at a finer grain with more complex constraints).

Thank you. I'll go have a look there if I decide I need to get that far.

It's not entirely unknown for this to get stuck on very high utilisation but 63% is concerning, so the design would be interesting too to get an idea what's going on (e.g. if there are particularly large numbers of long carry chains or unique control sets).

There's also a small but non-zero chance this isn't utilisation related at all but just a bug in the packer meaning that a placement is created with constraints that can never be satisified.

I don't think I have particularly long carry chains, but I'll admit I haven't looked at this either. Not expecting you to look, but if you do want to, the design is attached. I'll see what I can find tomorrow.

sidecar_mainboard_controller_top_rev_b_transceiver_changes.json.zip

arjenroodselaar commented 1 year ago

I did a bit of poking around in legalise_placement_strict placement strict and it seems like it's consistently getting stuck there.

First things first:

dev@4b25b4b781b1:~/nextpnr$ ./nextpnr-ecp5 --version
nextpnr-ecp5 -- Next Generation Place and Route (Version nextpnr-0.4-63-g16ffd02a)

This was built using prjtrellis at 35f5aff, using the database from oss-cad-suite-linux-arm64-20221205 (I don't think these were updated recently).

The constraints file: sidecar_mainboard_controller_rev_b.lpf.zip

Running the above linked design like so:

./nextpnr-ecp5 --45k --package CABGA554 --freq 50 --lpf sidecar_mainboard_controller_rev_b.lpf --json sidecar_mainboard_controller_top_rev_b_transceiver_changes.json --textcfg sidecar_mainboard_controller_top_rev_b_transceiver_changes.config --threads 1 -l place_stuck.log

It completes one iteration of main placement and then seems to get stuck trying to place MUX_controller_ignition_controllers_27_hello_expired_count_r$write_1__VAL_2_LUT4_Z_2 during the second iteration. It runs up to 909 iterations (for both iter and iter_at_radius) and then jumps to the notempty label at https://github.com/YosysHQ/nextpnr/blob/16ffd02a9d485c66707cc56bcd083cfe545a72f3/common/place/placer_heap.cc#L928, resetting both counters. It then runs up to 909 iterations again, repeating the process. I've not let it run for more than a few minutes, but unless I misread the code I don't think it can get out of this loop.

Looking at this particular cell it is part of a 6 bit counter (which only counts down from 25, so it may get reduced to 5 bits by Yosys), so it could be part of a carry chain. Looking at the verbose output of nextpnr this counter (or the 34 others like it) does not get reported as a carry chain before packing.

I need to do some more code reading to figure out where to go from here, but as is I don't think it can hit any of the iteration limits which are supposed to keep the process from never completing. I'll experiment a bit changing seeds to see if this behavior is always specific to this cell or if it moves around.

arjenroodselaar commented 1 year ago

Poking at the random seed I seem to be hitting this case all with very similar looking logic:

seed = 0, MUX_controller_ignition_controllers_24_hello_expired_count_r$write_1__VAL_2_LUT4_Z_2 (TRELLIS_COMB)
seed = 1, MUX_controller_ignition_controllers_14_hello_expired_count_r$write_1__VAL_2_LUT4_Z_2 (TRELLIS_COMB)
seed = 2, IF_controller_tofino_debug_port_i2c_bit_ctrl_s_ETC___d1315_LUT4_Z_10 (TRELLIS_COMB)
seed = 3, MUX_controller_ignition_controllers_33_status_update_expired_count_r$write_1__VAL_2_LUT4_Z_3 (TRELLIS_COMB)
seed = 4, MUX_controller_ignition_controllers_9_hello_expired_count_r$write_1__VAL_2_LUT4_Z_2 (TRELLIS_COMB)
seed = 5, MUX_controller_tick_1khz_count$write_1__VAL_2_LUT4_Z (TRELLIS_COMB)
seed = 6, MUX_controller_ignition_controllers_12_hello_expired_count_r$write_1__VAL_2_LUT4_Z_2 (TRELLIS_COMB)
seed = 7, MUX_controller_ignition_controllers_7_status_update_expired_count_r$write_1__VAL_2_LUT4_Z_3 (TRELLIS_COMB)
seed = 8, IF_controller_ignition_controllers_26_n_status_ETC___d21831_LUT4_Z (TRELLIS_COMB)
seed = 9, MUX_controller_ignition_controllers_4_hello_expired_count_r$write_1__VAL_2_LUT4_Z_2 (TRELLIS_COMB)
seed = 10, MUX_controller_ignition_controllers_12_hello_expired_count_r$write_1__VAL_2_LUT4_Z_2 (TRELLIS_COMB)
seed = 17, MUX_controller_ignition_controllers_20_status_update_expired_count_r$write_1__VAL_2_LUT4_Z_3
seed = 23, IF_controller_ignition_controllers_12_n_status_ETC___d11359_LUT4_Z (TRELLIS_COMB)
seed = 29, IF_controller_ignition_controllers_3_n_status__ETC___d4627_LUT4_Z (TRELLIS_COMB)
seed = 31, MUX_controller_ignition_controllers_34_hello_expired_count_r$write_1__VAL_2_LUT4_Z_2 (TRELLIS_COMB)
seed = 37, MUX_controller_ignition_controllers_33_status_update_expired_count_r$write_1__VAL_2_LUT4_Z_3 (TRELLIS_COMB)

Using a seed of 8 runs for 4 iterations. All the other cases hang either on the first or the second iteration. AFAIKT none of these are marked as carry chains. Looking through the generated Verilog file that's the input for this design it looks like all the MUX..count_r signals are values loaded into a counter registers. The IF.. signals are strobe like signals, driving either the EN signal for a register or used to select the value between two MUX.. signals.

gatecat commented 1 year ago

https://github.com/YosysHQ/nextpnr/pull/1065 does seem to get things working here

Unfortunately in general this is probably just a case of "nextpnr is bad at legalising flipflops with large numbers of different control sets (sets/resets and enables)" combined with "yosys is too good at inferring control sets". I can't guarantee you won't hit problems again if the design becomes bigger than it is right now, passing -nodffe to Yosys's synth_ecp5 might help if you do hit more problems. Otherwise I will keep this testcase in mind for further improvements here.

arjenroodselaar commented 1 year ago

Thank you for looking into this. Having not run into this problem before I did some reading which led me to https://docs.xilinx.com/r/en-US/ug949-vivado-design-methodology/Follow-Control-Set-Guidelines, where they describe a report that can be generated by Vivado on the number of control sets. Is the data available between Yosys and Nextpnr to do something similar? And if nextpnr would be able to generate these numbers, would you have some pointers for me as to where to start with this? Would I have to start with the control signals for each cell in the design and walk backwards to determine the unique sets?

Looking at the command help for opt_dff in Yosys the description for -nodffe is pretty high level. What are the effects of using this flags? And presumably you'd usually want this to happen (otherwise it wouldn't be enabled by default), but I am guessing it can't be applied more selectively?

arjenroodselaar commented 1 year ago

Using the -nodffe option to synth_ecp5 does indeed get the design the place, but at the expense of going from 63% utilization of TRELLIS_COMB to 80%. Adding the -mince and -minsrst options to the dfflegalize step and setting these to 8 allows the design to be placed with a modest utilization increase of only 3%. This provides just the trade-off I was looking for. Those options are currently not exposed in the synth_ecp5 script, but I'll put together a diff to add them.