Closed acomodi closed 4 years ago
I'm unclear, but it doesn't sound like this master+wip is ready? There are outstanding issues caused by the new version that we wouldn't want to integrate?
Indded, there are two issues currently:
The fact is that there is the IDELAYCTRL which needs to have this feature (in-design LOCing of primitives).
I was wrong here, the IDELAYCTRL actually gets placed correctly and the LOC constraints are there. This issue seems to affect only FF.
I created an fpga-tool-perf test on Hydra using this yosys and testing-new-yosys-master+wip
symbiflow-arch-defs.
@litghost @HackerFoo I have updated the new master+wip integration.
I have identified and solved the issues that prevented the litex-linux design to properly work.
The SRL post-synthesis tests were not caused by the SRL themeselves, but depended by an issue with the CARRYs which were trimmed when performing the ABC9 pass. By solving the CARRY ABC9 issue (https://github.com/SymbiFlow/yosys/pull/81/commits/3353f0e47b8f22c1e02b82c4fa8f806eec9000cd), the SRL tests passed correctly. This allows us to get rid of the wip/disable-srl-abc9
branch and enable ABC9 on SRLs safely.
The ethernet issue was due to a wrong mapping when yosys infers the LCUs. There was a change (cmp2lcu 051aefc3c) that introduced the inference of LCU when processing comparison operations. This is why I had to add VPR carry chains definitions also for the LCU module in arith_map.v which I needed to fix (https://github.com/SymbiFlow/yosys/pull/81/commits/3edd17f94d3ec19aab8087fea73179fdb8de5170) as there was a bug in there.
Another addition is the enabling of RAMB36 inference
I am currently testing the all_xc7
target in symbiflow-arch-defs to double-check that all the tests do pass.
- The ethernet issue was due to a wrong mapping when yosys infers the LCUs. There was a change (cmp2lcu 051aefc) that introduced the inference of LCU when processing comparison operations. This is why I had to add VPR carry chains definitions also for the LCU module in arith_map.v which I needed to fix (3edd17f) as there was a bug in there.
The LCU should just use the CARRY4 primative, because the LCU only uses the CO outputs. The reason that the _80_xilinx_alu
uses the special primitive (CARRY4_COUT
) is because yosys tends to over use CO and O outputs resulting in congestion at the outputs, specially CO[3]
and O[3]
. Becuase the LCU only uses the CO
ports, there is no problem.
Specifically the LCU should just use the code at the else
block: https://github.com/SymbiFlow/yosys/commit/3edd17f94d3ec19aab8087fea73179fdb8de5170#diff-7021595c247e915d94d36ec511281cbbR166
- The SRL post-synthesis tests were not caused by the SRL themeselves, but depended by an issue with the CARRYs which were trimmed when performing the ABC9 pass. By solving the CARRY ABC9 issue (3353f0e), the SRL tests passed correctly. This allows us to get rid of the
wip/disable-srl-abc9
branch and enable ABC9 on SRLs safely.
This change means that simulations using a CARRY chain where CYINIT or CIN are not specified will failed. I don't know what yosys changed that the ifdef
is no longer working, but the solution is to fix the ifdef
, not remove it.
Is there an updated XDC plugin that will work with this version of Yosys? https://github.com/SymbiFlow/yosys-symbiflow-plugins/commit/1c495fd47ddfc54a9f815c0ba97dc112e1731bd6 does not work.
@litghost All right, I will fix the LCU carry chains and the ifdef.
Specifically the LCU should just use the code at the else block
By this you mean the following else block? https://github.com/SymbiFlow/yosys/blob/3edd17f94d3ec19aab8087fea73179fdb8de5170/techlibs/xilinx/arith_map.v#L134-L162
If the case is the latter one, I think we would need to add still the _COUT
prefix as VPR does not recognize those (or at least add a CARRY4
definition in the cells_map.v
I have found another issue, related to the LOC constraints, which is slightly different from the previous one (which I filed here). In this case, the techmapping pass makes attributes to be lost. I have a reduced test case and the bad commit causing the problem, and used that for another issue upstream.
If the case is the latter one, I think we would need to add still the
_COUT
prefix as VPR does not recognize those (or at least add aCARRY4
definition in the cells_map.v
The latter one, but your analysis is incorrect. The CARRY4_COUT has no CO port, so it cannot be used. VPR understands neither CARRY4_COUT nor CARRY4, it understands CARRY4_VPR. CARRY4_COUT is mapped to CARRY4_VPR during the symbiflow-arch-defs portion of the yosys flow.
The solution in this case is to add a mapping of CARRY4 to CARRY4_VPR. This works in this instance because the LCU CARRY4 doesn't use both the CO and O ports, only the CO ports.
Is there an updated XDC plugin that will work with this version of Yosys? SymbiFlow/yosys-symbiflow-plugins@1c495fd does not work.
@HackerFoo Not yet on conda. I have just pushed a fix for building the plugins with the newer version of yosys here: https://github.com/SymbiFlow/yosys-symbiflow-plugins/commit/809a2eac0c3132cfbdd84d7f4069186a1d7905bb
The solution in this case is to add a mapping of CARRY4 to CARRY4_VPR. This works in this instance because the LCU CARRY4 doesn't use both the CO and O ports, only the CO ports.
All clear, thanks
@litghost I have been trying to get the CARRY4 in the techmap to have the CARRY4_VPR out of them, the problem though seems to be that there is a routing issue given by the CI input of one carrychain being connected to a DMUX output of the previous carry chain:
## Initializing router criticalities took 0.06 seconds (max_rss 3464.8 MiB, delta_rss +0.0 MiB)
---- ------ ------- ---- ------- ------- ------- ----------------- --------------- -------- ---------- ---------- ---------- ---------- --------
Iter Time pres BBs Heap Re-Rtd Re-Rtd Overused RR Nodes Wirelength CPD sTNS sWNS hTNS hWNS Est Succ
(sec) fac Updt push Nets Conns (ns) (ns) (ns) (ns) (ns) Iter
---- ------ ------- ---- ------- ------- ------- ----------------- --------------- -------- ---------- ---------- ---------- ---------- --------
Warning 1224: No routing path for connection to sink_rr 813302, retrying with full device bounding box
Cannot route from BLK-TL-CLBLL_L[0].CLBLL_L_DMUX[0] (RR node: 2777187 type: SOURCE location: (72,130) class: 87 capacity: 1) to BLK-TL-CLBLL_L[0].CLBLL_LL_CIN[0] (RR node: 813302 type: SINK location: (72,17) class: 21 capacity: 1) -- no possible path
Failed to route connection from '$ge$/data/symbiflow/symbiflow-arch-defs/build/xc/xc7/tests/soc/litex/base/baselitex_arty.v:3519$52.slice[4].carry4' to '$ge$/data/symbiflow/symbiflow-arch-defs/build/xc/xc7/tests/soc/litex/base/baselitex_arty.v:3519$52.slice[5].carry4' for net '$abc
$303469$aiger303468$8737' (#11248)
Routing failed.
# Routing took 28.51 seconds (max_rss 3464.8 MiB, delta_rss +0.0 MiB)
Circuit is unroutable with a channel width factor of 500.
VPR failed to implement circuit
The entire flow of VPR took 62.31 seconds (max_rss 3464.8 MiB)
I have then also tried to replace CARRY4_COUT instead of CARRY4 with the addition of the CARRY_COUT_PLUG to overcome the issue, but the router had issues in routing as well, with higher runtime per route iteration as well worse CPD.
On a side note, it seems that RAMB36 let VPR reach worse CPD ~18.6 ns vs ~16.8 while using RAMB18 only.
@litghost I have been trying to get the CARRY4 in the techmap to have the CARRY4_VPR out of them, the problem though seems to be that there is a routing issue given by the CI input of one carrychain being connected to a DMUX output of the previous carry chain:
Ya, I was worried about this. CARRY4_VPR has two ports, CO_CHAIN and CO_FABRIC3. The underlying CARRY4 only has CO[3], which connects to both CO_CHAIN and CO_FABRIC3. We should probably make a CARRY4_CO_COUT that is like the CARRY4_COUT.
@acomodi, FYI https://github.com/YosysHQ/yosys/issues/2213
It looks like the conditional compilation directives were removed from yosys during development, unclear what happened between then and now. Hopefully we will get clarity.
@acomodi https://github.com/litghost/symbiflow-arch-defs/commits/new_yosys_with_conda + https://github.com/litghost/yosys/tree/sync_yosys is my latest attempt. I'm running all_xc7
now.
GitHubFOSS architecture definitions of FPGA hardware useful for doing PnR device generation. - litghost/symbiflow-arch-defs
GitHubYosys Open SYnthesis Suite. Contribute to litghost/yosys development by creating an account on GitHub.
@acomodi, FYI YosysHQ#2213
It looks like the conditional compilation directives were removed from yosys during development, unclear what happened between then and now. Hopefully we will get clarity.
If the issue is getting a different definition if in simulation by defining the various conditions explicitly instead of the |
, wouldn't it be ok to use the __ICARUS__
directive? I am unsure how it works and who sets the directive, but it is used here for instance:
https://github.com/YosysHQ/yosys/blob/9d658a19708b99288bed2d8c8da3a2fe987a21b3/techlibs/xilinx/cells_sim.v#L289-L303
GitHubYosys Open SYnthesis Suite. Contribute to YosysHQ/yosys development by creating an account on GitHub.
@acomodi https://github.com/litghost/symbiflow-arch-defs/commits/new_yosys_with_conda + https://github.com/litghost/yosys/tree/sync_yosys is my latest attempt. I'm running
all_xc7
now.
I have tried to run with these branches, but I got an error in VPR when running the baselitex example that let yosys infer LCUs instead of ALUs.
The error is the following:
vpr/src/base/netlist.tpp:1887 associate_pin_with_net: Assertion 'net_pins_[net_id][0] == PinId::INVALI
D()' failed (Must be no existing net driver).
xc/xc7/tests/soc/litex/base/CMakeFiles/file_xc_xc7_tests_soc_litex_base_baselitex_arty_artix7-xc7a50t-virt-xc7a50t-test_top.net.dir/build.make:68: recipe for target 'xc/xc7/tests/soc/litex/base/baselitex_arty/artix7-xc7a50t-virt-xc7a50t-test/top.net' failed
make[3]: *** [xc/xc7/tests/soc/litex/base/baselitex_arty/artix7-xc7a50t-virt-xc7a50t-test/top.net] Error 134
CMakeFiles/Makefile2:239603: recipe for target 'xc/xc7/tests/soc/litex/base/CMakeFiles/file_xc_xc7_tests_soc_litex_base_baselitex_arty_artix7-xc7a50t-virt-xc7a50t-test_top.net.dir/all' failed
make[2]: *** [xc/xc7/tests/soc/litex/base/CMakeFiles/file_xc_xc7_tests_soc_litex_base_baselitex_arty_artix7-xc7a50t-virt-xc7a50t-test_top.net.dir/all] Error 2
What I am unsure of is why cannot we just use the current arith_map.v that treats carry chains in the same way as for the ALU which proved to be working correctly.
Another thing we could do to move forward and get yosys in sync with upstream is to temporarily disable the LCU inference by using the vpr
flag in synth_xilinx.cc
and open an issue about restoring the LCU in a second moment.
I have tested the implementation without LCU and it works on HW. The only problem is that routing takes a longer time to compute.
I will provide some more data on the run-times and CPD with the various configurations in a bit.
GitHubFOSS architecture definitions of FPGA hardware useful for doing PnR device generation. - litghost/symbiflow-arch-defs
GitHubYosys Open SYnthesis Suite. Contribute to litghost/yosys development by creating an account on GitHub.
Router data with different configurations when running the baselitex test.
LCU | RAMB36 | CPD (ns) | Route time (s) | Works on HW |
---|---|---|---|---|
NO | NO | 18.084 | 87.75 | YES |
NO | YES | 18.766 | 147.16 | YES |
YES | NO | 16.876 | 101.26 | YES |
YES | YES | 18.966 | 85.25 | YES |
Note: when RAMB36 is disabled, only RAMB18 is used
The ABC9 directive was removed here: 1dc22607c38486d9e1a2b56f749d1eca35d405d2
We could temporarily re-add that and have restore the ifdef
What I am unsure of is why cannot we just use the current arith_map.v that treats carry chains in the same way as for the ALU which proved to be working correctly.
Because the O and CO outputs are not the same. You are increasing the logic depth by 1 by using the O outputs instead of the CO outputs. The LCU inference is clearly a win, so let's implement it correct using the CARRY4 hardware.
If the issue is getting a different definition if in simulation by defining the various conditions explicitly instead of the
|
, wouldn't it be ok to use the__ICARUS__
directive? I am unsure how it works and who sets the directive, but it is used here for instance:
That's what I proposed here: https://github.com/litghost/yosys/blob/e8bfed992e7926c1229229e7e9a2b9f6f537252b/techlibs/xilinx/cells_sim.v#L479-L485
For clarity, __ICARUS__
works because of "Predefined Macros" in https://linux.die.net/man/1/iverilog
GitHubYosys Open SYnthesis Suite. Contribute to litghost/yosys development by creating an account on GitHub.
Router data with different configurations when running the baselitex test.
It would be good to check the Yosys+Vivado results for the same configurations.
@acomodi I believe we have testbenches for counter-like designs, which test the alu arith_map.v. I strongly believe we need to add a LCU testbench to detect whether the lcu arith_map.v is working. It is not a good idea to need to test in hardware all of LiteX just to verify if the yosys+vpr LCU / ALU logic inference is working as expected.
@litghost working on that, I am taking the reduced testbenches from yosys that infer lcu, so that we can have much faster iteration steps.
@acomodi https://github.com/litghost/vtr-verilog-to-routing/tree/make_capnp_backward + https://github.com/litghost/symbiflow-arch-defs/tree/new_yosys_with_conda + https://github.com/litghost/yosys/tree/sync_yosys passed your LCU testbench.
The new VPR is running through CI now over here: https://github.com/SymbiFlow/vtr-verilog-to-routing/pull/525 The new symbiflow-arch-defs is basically https://github.com/SymbiFlow/symbiflow-arch-defs/pull/1558 plus the changes needed for the new yosys.
GitHubVerilog to Routing -- Open Source CAD Flow for FPGA Research - litghost/vtr-verilog-to-routing
GitHubFOSS architecture definitions of FPGA hardware useful for doing PnR device generation. - litghost/symbiflow-arch-defs
GitHubYosys Open SYnthesis Suite. Contribute to litghost/yosys development by creating an account on GitHub.
@litghost FYI SymbiFlow/symbiflow-arch-defs#1564
I wonder if the LCU width is too narrow to exhibit the issue? I'm trying to widen the compare operation and see what happens.
@litghost FYI SymbiFlow/symbiflow-arch-defs#1564
I wonder if the LCU width is too narrow to exhibit the issue? I'm trying to widen the compare operation and see what happens.
Expanding from an 8 bit counter to a 32 bit counter resulted in testbench_synth_lcu_tb
failing, I'll see if I can post a fix today.
@acomodi I extended your LCU test in https://github.com/litghost/symbiflow-arch-defs/commit/c2bc596ece222e83f696377f465d7bd251a5d4cb and fixed a bug. Maybe that combo will work on hardware now?
I'm still debugging https://github.com/SymbiFlow/vtr-verilog-to-routing/pull/525, but the problem appears to lies in ROI designs, not ROI-less.
@litghost FYI SymbiFlow/symbiflow-arch-defs#1564
I wonder if the LCU width is too narrow to exhibit the issue? I'm trying to widen the compare operation and see what happens.
Expanding from an 8 bit counter to a 32 bit counter resulted in
testbench_synth_lcu_tb
failing, I'll see if I can post a fix today.
Latest version of 32 bit test is pushed.
@litghost, I have run the baslitex build once more with the branches pointed by you, and the implementation works on HW. The only thing though is that we reach a worse CPD than the current baselitex implementation, but it might be due to the usage of RAMB36.
An updated table with the results using the various branches (and the current master+wip VTR) is the following:
LCU | RAMB36 | CPD (ns) | Route time (s) | Works on HW |
---|---|---|---|---|
YES | YES | 18.451 | 105.60 | YES |
While the current yosys master+wip achieves this
LCU | RAMB36 | CPD (ns) | Route time (s) | Works on HW |
---|---|---|---|---|
YES | YES | 17.771 | 270.75 | YES |
(Unsure why run-time is that high, it should be around 147s, I would need to relaunch the test in perf-tool)
I think there also is an issue with fasm2bels as the diff_fasm reports one fasm line difference related to the DOUTMUX:
--- /data/symbiflow/symbiflow-arch-defs/build/xc/xc7/tests/soc/litex/base/baselitex_arty/artix7-xc7a50t-virt-xc7a50t-test/top.bit.fasm 2020-07-02 10:55:30.114271878 +0200
+++ /data/symbiflow/symbiflow-arch-defs/build/xc/xc7/tests/soc/litex/base/baselitex_arty/artix7-xc7a50t-virt-xc7a50t-test/design_baselitex_arty_vivado.bit.fasm 2020-07-02 11:44:53.814974831 +0200
@@ -15509,7 +15509,7 @@
CLBLL_L_X28Y89.SLICEL_X1.DFF.ZRST
CLBLL_L_X28Y89.SLICEL_X1.DFFMUX.O5
CLBLL_L_X28Y89.SLICEL_X1.DLUT.INIT[63:0] = 64'b1010101010101010101010101010101011111111000000001111000011110000
-CLBLL_L_X28Y89.SLICEL_X1.DOUTMUX.CY
+CLBLL_L_X28Y89.SLICEL_X1.DOUTMUX.O6
CLBLL_L_X28Y89.SLICEL_X1.FFSYNC
CLBLL_L_X28Y89.SLICEL_X1.NOCLKINV
CLBLL_L_X28Y89.SLICEL_X1.PRECYINIT.C1
The vivado-generated bitstream has a non-working ethernet module, therefore this is most probably a fasm2bels issue that was uncaught as CARRY chains used to implement LCUs were never adopted before.
@litghost, I have run the baslitex build once more with the branches pointed by you, and the implementation works on HW. The only thing though is that we reach a worse CPD than the current baselitex implementation, but it might be due to the usage of RAMB36.
That's good. At this point, we should probably build a VPR with the latest upstream master + the update to libblifparse (in a wip branch) + a temporary wip branch https://github.com/litghost/vtr-verilog-to-routing/tree/wip/add_back_disable_flag . I expect that VPR should work with current upstream and work with the new yosys master+wip. We can tease out the CPD stuff later. Looking at 1 circuit isn't enough to determine if the new yosys is better or worse, and the new yosys isn't significantly worse.
GitHubVerilog to Routing -- Open Source CAD Flow for FPGA Research - litghost/vtr-verilog-to-routing
I think there also is an issue with fasm2bels as the diff_fasm reports one fasm line difference related to the DOUTMUX:
--- /data/symbiflow/symbiflow-arch-defs/build/xc/xc7/tests/soc/litex/base/baselitex_arty/artix7-xc7a50t-virt-xc7a50t-test/top.bit.fasm 2020-07-02 10:55:30.114271878 +0200 +++ /data/symbiflow/symbiflow-arch-defs/build/xc/xc7/tests/soc/litex/base/baselitex_arty/artix7-xc7a50t-virt-xc7a50t-test/design_baselitex_arty_vivado.bit.fasm 2020-07-02 11:44:53.814974831 +0200 @@ -15509,7 +15509,7 @@ CLBLL_L_X28Y89.SLICEL_X1.DFF.ZRST CLBLL_L_X28Y89.SLICEL_X1.DFFMUX.O5 CLBLL_L_X28Y89.SLICEL_X1.DLUT.INIT[63:0] = 64'b1010101010101010101010101010101011111111000000001111000011110000 -CLBLL_L_X28Y89.SLICEL_X1.DOUTMUX.CY +CLBLL_L_X28Y89.SLICEL_X1.DOUTMUX.O6 CLBLL_L_X28Y89.SLICEL_X1.FFSYNC CLBLL_L_X28Y89.SLICEL_X1.NOCLKINV CLBLL_L_X28Y89.SLICEL_X1.PRECYINIT.C1
The vivado-generated bitstream has a non-working ethernet module, therefore this is most probably a fasm2bels issue that was uncaught as CARRY chains used to implement LCUs were never adopted before.
Ya, this appears to be a fasm2bels bug.
Looks like merging this was a mistake, as failing travis tests prevents the package from being pushed.
This PR enables a new master+wip for yosys.
Apart from the issues discovered here https://github.com/SymbiFlow/yosys/issues/79#issuecomment-646115283, there is an additional one which needs to be solved, and is related to the impossibility of setting LOC attributes in the verilog design for primitives (issue open upstream https://github.com/YosysHQ/yosys/issues/2176).
This new master+wip version is compatible with this version of symbiflow-arch-defs and
all_xc7
tests do pass.The only outstanding issue is solving a problem with Ethernet in the base linux-litex example which is not correctly working on HW.