Open acomodi opened 3 years ago
@acomodi There are a couple of things going wrong here. I would not expect a huge change in routing time for memory init functions?
@acomodi This kind of sounds like the BRAM is being converted into flops?
@mithro Hmm, I doubt this is the issue. I am pretty sure that, if a BRAM gets inferred as logic, a much higher number of SLICEMs will be used, and DRAMs would get inferred instead, which is not the case here:
Pack log:
60620 Physical Tile BLK-TL-CLBLM_L:
60621 Block Utilization: 0.62 Logical Block: BLK-TL-SLICEL
60622 Block Utilization: 0.01 Logical Block: BLK-TL-SLICEM
60623 Physical Tile BLK-TL-CLBLM_R:
60624 Block Utilization: 0.37 Logical Block: BLK-TL-SLICEL
60625 Block Utilization: 0.00 Logical Block: BLK-TL-SLICEM
I have compared the two different resource utilizations from the pack.log and they are exactly the same in both runs.
This means that the circuit is the same and it gets implemented on the same number/types of resources.
I think that this might be an issue with the initial placement, and placement in general, specific to the BRAMs. Basically, even though the same packed clusters are being produced (at least this is the assumption) their ordering differs, causing changes in the initial placement.
Given that we currently support only BRAM_Ls, we might end up in a situation where some BRAMs get placed far from the core logic of the design, ending up in the reported differences in CPD and run-time. And this isn't actually related to BRAMs, but to all the tiles, but given that there is a hughe choice of CLBs, the placer should correctly optimize their placement, while, the lack of BRAMs might end up in bad placements (with consequent bad routing results).
@acomodi - So this has nothing to do with the BRAM contents then and should happen with different seeds?
@mithro At the moment these are only theories. I am performing additional tests and trying also different seeds to see if this might be the real issue here. I'll post some additional data soon.
Seems that the issue is initial placement indeed. By using the same packer output and by changing the seed during placement to 1000 I got the following routing iterations:
default seed:
## Initializing router criticalities took 0.03 seconds (max_rss 3353.3 MiB, delta_rss +0.0 MiB)
---- ------ ------- ---- ------- ------- ------- ----------------- --------------- -------- ---------- ---------- ---------- ---------- --------
Iter Time pres BBs Heap Re-Rtd Re-Rtd Overused RR Nodes Wirelength CPD sTNS sWNS hTNS hWNS Est Succ
(sec) fac Updt push Nets Conns (ns) (ns) (ns) (ns) (ns) Iter
---- ------ ------- ---- ------- ------- ------- ----------------- --------------- -------- ---------- ---------- ---------- ---------- --------
Warning 108: 6 timing startpoints were not constrained during timing analysis
Warning 109: 1521 timing endpoints were not constrained during timing analysis
1 17.0 0.0 0 2.2e+08 7670 25926 12853 ( 0.439%) 332480 ( 5.2%) 18.939 -70.37 -2.273 0.000 0.000 N/A
2 5.0 2.8 2 4.9e+07 5694 18419 6520 ( 0.223%) 363815 ( 5.7%) 18.424 -52.74 -1.758 0.000 0.000 N/A
3 4.1 3.4 4 3.7e+07 4113 13278 4585 ( 0.157%) 382077 ( 5.9%) 18.440 -63.06 -1.774 0.000 0.000 N/A
4 3.6 4.1 4 3.2e+07 3018 10305 2985 ( 0.102%) 396871 ( 6.2%) 18.349 -49.51 -1.683 0.000 0.000 N/A
5 3.3 4.9 9 2.8e+07 2131 7832 1730 ( 0.059%) 410556 ( 6.4%) 18.377 -53.28 -1.711 0.000 0.000 N/A
6 3.1 5.9 4 2.4e+07 1396 5555 947 ( 0.032%) 421353 ( 6.6%) 18.409 -52.52 -1.743 0.000 0.000 N/A
7 1.9 7.0 6 1.5e+07 819 3461 437 ( 0.015%) 428970 ( 6.7%) 18.409 -54.91 -1.743 0.000 0.000 N/A
8 1.6 8.4 5 1.1e+07 437 1848 190 ( 0.006%) 433440 ( 6.7%) 18.406 -58.14 -1.740 0.000 0.000 N/A
9 0.5 10.1 8 3655223 202 735 65 ( 0.002%) 435479 ( 6.8%) 18.389 -57.54 -1.723 0.000 0.000 N/A
10 0.8 12.2 3 4537900 76 281 19 ( 0.001%) 436226 ( 6.8%) 18.389 -58.77 -1.723 0.000 0.000 14
11 0.1 14.6 0 577913 23 78 4 ( 0.000%) 436748 ( 6.8%) 18.389 -59.10 -1.723 0.000 0.000 13
12 0.0 17.5 0 154050 5 11 1 ( 0.000%) 436723 ( 6.8%) 18.389 -59.10 -1.723 0.000 0.000 13
13 0.0 21.0 0 19600 1 5 0 ( 0.000%) 436780 ( 6.8%) 18.389 -59.85 -1.723 0.000 0.000 12
custom seed (1000):
## Initializing router criticalities took 0.03 seconds (max_rss 3353.0 MiB, delta_rss +0.0 MiB)
---- ------ ------- ---- ------- ------- ------- ----------------- --------------- -------- ---------- ---------- ---------- ---------- --------
Iter Time pres BBs Heap Re-Rtd Re-Rtd Overused RR Nodes Wirelength CPD sTNS sWNS hTNS hWNS Est Succ
(sec) fac Updt push Nets Conns (ns) (ns) (ns) (ns) (ns) Iter
---- ------ ------- ---- ------- ------- ------- ----------------- --------------- -------- ---------- ---------- ---------- ---------- --------
Warning 108: 6 timing startpoints were not constrained during timing analysis
Warning 109: 1521 timing endpoints were not constrained during timing analysis
1 19.2 0.0 0 2.3e+08 7670 25926 13086 ( 0.447%) 340954 ( 5.3%) 19.869 -132.1 -3.203 0.000 0.000 N/A
2 5.3 2.8 4 4.5e+07 5679 18278 6717 ( 0.229%) 371999 ( 5.8%) 19.854 -130.3 -3.188 0.000 0.000 N/A
3 4.3 3.4 7 3.4e+07 4149 13441 4758 ( 0.162%) 391802 ( 6.1%) 19.774 -138.9 -3.108 0.000 0.000 N/A
4 4.1 4.1 4 3.2e+07 3005 10744 3210 ( 0.110%) 407592 ( 6.3%) 19.761 -137.1 -3.095 0.000 0.000 N/A
5 3.8 4.9 2 2.8e+07 2148 8274 1896 ( 0.065%) 423111 ( 6.6%) 19.839 -143.1 -3.173 0.000 0.000 N/A
6 2.7 5.9 7 2.0e+07 1420 5994 1071 ( 0.037%) 432306 ( 6.7%) 19.858 -154.0 -3.192 0.000 0.000 N/A
7 1.8 7.0 11 1.3e+07 890 3846 509 ( 0.017%) 440328 ( 6.9%) 19.879 -157.9 -3.213 0.000 0.000 N/A
8 1.2 8.4 8 8812194 450 1885 224 ( 0.008%) 445947 ( 6.9%) 19.923 -163.2 -3.257 0.000 0.000 N/A
9 1.0 10.1 4 6420327 212 870 78 ( 0.003%) 448470 ( 7.0%) 19.911 -162.1 -3.245 0.000 0.000 N/A
10 0.3 12.2 4 2119048 82 279 31 ( 0.001%) 449305 ( 7.0%) 19.897 -161.4 -3.231 0.000 0.000 15
11 0.2 14.6 2 1247043 40 131 12 ( 0.000%) 449878 ( 7.0%) 19.911 -163.3 -3.245 0.000 0.000 14
12 0.1 17.5 1 798032 17 39 6 ( 0.000%) 450097 ( 7.0%) 19.911 -163.3 -3.245 0.000 0.000 14
13 0.2 21.0 1 962631 11 23 4 ( 0.000%) 450117 ( 7.0%) 19.911 -163.3 -3.245 0.000 0.000 14
14 0.2 25.2 0 744835 5 5 2 ( 0.000%) 450255 ( 7.0%) 19.911 -163.3 -3.245 0.000 0.000 14
15 0.0 30.3 0 104536 2 4 0 ( 0.000%) 450311 ( 7.0%) 19.911 -163.3 -3.245 0.000 0.000 15
This is using the current symbiflow-arch-defs master (https://github.com/SymbiFlow/symbiflow-arch-defs/commit/1d921548f5211560b57e7ebab7f4a5dfc5b6a784) and its conda VTR package.
I have also double-checked once again the packer results utilizatiion between two different runs and it actually changes from run to run:
Control run:
Resource usage...
Netlist
1114 blocks of type: BLK-TL-SLICEL
Architecture
2150 blocks of type: BLK-TL-CLBLL_L
1200 blocks of type: BLK-TL-CLBLL_R
1800 blocks of type: BLK-TL-CLBLM_L
3000 blocks of type: BLK-TL-CLBLM_R
Netlist
15 blocks of type: BLK-TL-SLICEM
Architecture
1800 blocks of type: BLK-TL-CLBLM_L
3000 blocks of type: BLK-TL-CLBLM_R
Netlist
25 blocks of type: BLK-TL-BRAM_L
Architecture
55 blocks of type: BLK-TL-BRAM_L
Netlist
8 blocks of type: BLK-TL-IOPAD
Architecture
6 blocks of type: BLK-TL-LIOPAD_SING
4 blocks of type: BLK-TL-RIOPAD_SING
72 blocks of type: BLK-TL-LIOPAD_M
48 blocks of type: BLK-TL-RIOPAD_M
72 blocks of type: BLK-TL-LIOPAD_S
48 blocks of type: BLK-TL-RIOPAD_S
Netlist
0 blocks of type: BLK-TL-IOPAD_M
Architecture
72 blocks of type: BLK-TL-LIOPAD_M
48 blocks of type: BLK-TL-RIOPAD_M
Netlist
0 blocks of type: BLK-TL-IOPAD_S
Architecture
72 blocks of type: BLK-TL-LIOPAD_S
48 blocks of type: BLK-TL-RIOPAD_S
Netlist
2 blocks of type: BLK-TL-BUFGCTRL
Architecture
16 blocks of type: BLK-TL-CLK_BUFG_BOT_R
16 blocks of type: BLK-TL-CLK_BUFG_TOP_R
Netlist
1 blocks of type: BLK-TL-PLLE2_ADV
Architecture
2 blocks of type: BLK-TL-CMT_TOP_L_UPPER_T
3 blocks of type: BLK-TL-CMT_TOP_R_UPPER_T
Netlist
0 blocks of type: BLK-TL-HCLK_IOI3
Architecture
5 blocks of type: BLK-TL-HCLK_IOI3
Netlist
1 blocks of type: SYN-VCC
Architecture
1 blocks of type: SYN-VCC
Netlist
1 blocks of type: SYN-GND
Architecture
1 blocks of type: SYN-GND
Test run:
Resource usage...
Netlist
1099 blocks of type: BLK-TL-SLICEL
Architecture
2150 blocks of type: BLK-TL-CLBLL_L
1200 blocks of type: BLK-TL-CLBLL_R
1800 blocks of type: BLK-TL-CLBLM_L
3000 blocks of type: BLK-TL-CLBLM_R
Netlist
15 blocks of type: BLK-TL-SLICEM
Architecture
1800 blocks of type: BLK-TL-CLBLM_L
3000 blocks of type: BLK-TL-CLBLM_R
Netlist
25 blocks of type: BLK-TL-BRAM_L
Architecture
55 blocks of type: BLK-TL-BRAM_L
Netlist
8 blocks of type: BLK-TL-IOPAD
Architecture
6 blocks of type: BLK-TL-LIOPAD_SING
4 blocks of type: BLK-TL-RIOPAD_SING
72 blocks of type: BLK-TL-LIOPAD_M
48 blocks of type: BLK-TL-RIOPAD_M
72 blocks of type: BLK-TL-LIOPAD_S
48 blocks of type: BLK-TL-RIOPAD_S
Netlist
0 blocks of type: BLK-TL-IOPAD_M
Architecture
72 blocks of type: BLK-TL-LIOPAD_M
48 blocks of type: BLK-TL-RIOPAD_M
Netlist
0 blocks of type: BLK-TL-IOPAD_S
Architecture
72 blocks of type: BLK-TL-LIOPAD_S
48 blocks of type: BLK-TL-RIOPAD_S
Netlist
2 blocks of type: BLK-TL-BUFGCTRL
Architecture
16 blocks of type: BLK-TL-CLK_BUFG_BOT_R
16 blocks of type: BLK-TL-CLK_BUFG_TOP_R
Netlist
1 blocks of type: BLK-TL-PLLE2_ADV
Architecture
2 blocks of type: BLK-TL-CMT_TOP_L_UPPER_T
3 blocks of type: BLK-TL-CMT_TOP_R_UPPER_T
Netlist
0 blocks of type: BLK-TL-HCLK_IOI3
Architecture
5 blocks of type: BLK-TL-HCLK_IOI3
Netlist
1 blocks of type: SYN-VCC
Architecture
1 blocks of type: SYN-VCC
Netlist
1 blocks of type: SYN-GND
Architecture
1 blocks of type: SYN-GND
There is a variation in the SLICEL count.
The variation in the SLICEL count might also be a packing issue, rather than a placer issue. Still worth investigating.
@litghost Indeed. What bugs me though is the huge difference in the eblifs:
An initialization memory change should not alter the synthesized output this much. I'll need to reduce the test case to get a better understanding on what is happening.
Anyway, this issue, for now will reflect on all the litex tests results from CI, so two different CI runs cannot be compared at the moment.
While testing the RR graph base costs fixes I came across a possible instability issue that causes run-time and QoR to drastically change between one build to another.
With the newly added litex designs autogeneration, the resulting litex-generated designs files are exactly the same from one build to another, except for the
mem.init
files. The difference in themem.init
files is related to the LiteX bios which will report a different timestamp of the design generation. This is expected to happen.The small changes in the
mem.init
files though generate a great difference between the final outputs of the two runs.In fact, the difference in the generated
eblif
files is huge, even though they should be different only in a specific BRAM init values fields. This leads to VPR producing different outputs, step by step, with a kind-of non-deterministic behaviour.An example is the
minilitex
test:Difference in litex-generated files:
DIfference in resulting routing run-time and QoR:
Control run:
Test run:
NOTE: this test has been performed using the RR graph base costs fixes, but I am running experiments using the master+wip version on Symbiflow baseline, and I do expect to see a similar behavior.