SymbiFlow / yosys

SymbiFlow WIP changes for Yosys Open SYnthesis Suite
http://www.clifford.at/yosys/
ISC License
37 stars 9 forks source link

New master+wip #81

Closed acomodi closed 4 years ago

acomodi commented 4 years ago

This PR enables a new master+wip for yosys.

Apart from the issues discovered here https://github.com/SymbiFlow/yosys/issues/79#issuecomment-646115283, there is an additional one which needs to be solved, and is related to the impossibility of setting LOC attributes in the verilog design for primitives (issue open upstream https://github.com/YosysHQ/yosys/issues/2176).

This new master+wip version is compatible with this version of symbiflow-arch-defs and all_xc7 tests do pass.

The only outstanding issue is solving a problem with Ethernet in the base linux-litex example which is not correctly working on HW.

acomodi commented 4 years ago

I'm unclear, but it doesn't sound like this master+wip is ready? There are outstanding issues caused by the new version that we wouldn't want to integrate?

Indded, there are two issues currently:

acomodi commented 4 years ago

The fact is that there is the IDELAYCTRL which needs to have this feature (in-design LOCing of primitives).

I was wrong here, the IDELAYCTRL actually gets placed correctly and the LOC constraints are there. This issue seems to affect only FF.

HackerFoo commented 4 years ago

I created an fpga-tool-perf test on Hydra using this yosys and testing-new-yosys-master+wip symbiflow-arch-defs.

acomodi commented 4 years ago

@litghost @HackerFoo I have updated the new master+wip integration.

I have identified and solved the issues that prevented the litex-linux design to properly work.

I am currently testing the all_xc7 target in symbiflow-arch-defs to double-check that all the tests do pass.

litghost commented 4 years ago
  • The ethernet issue was due to a wrong mapping when yosys infers the LCUs. There was a change (cmp2lcu 051aefc) that introduced the inference of LCU when processing comparison operations. This is why I had to add VPR carry chains definitions also for the LCU module in arith_map.v which I needed to fix (3edd17f) as there was a bug in there.

The LCU should just use the CARRY4 primative, because the LCU only uses the CO outputs. The reason that the _80_xilinx_alu uses the special primitive (CARRY4_COUT) is because yosys tends to over use CO and O outputs resulting in congestion at the outputs, specially CO[3] and O[3]. Becuase the LCU only uses the CO ports, there is no problem.

Specifically the LCU should just use the code at the else block: https://github.com/SymbiFlow/yosys/commit/3edd17f94d3ec19aab8087fea73179fdb8de5170#diff-7021595c247e915d94d36ec511281cbbR166

litghost commented 4 years ago
  • The SRL post-synthesis tests were not caused by the SRL themeselves, but depended by an issue with the CARRYs which were trimmed when performing the ABC9 pass. By solving the CARRY ABC9 issue (3353f0e), the SRL tests passed correctly. This allows us to get rid of the wip/disable-srl-abc9 branch and enable ABC9 on SRLs safely.

This change means that simulations using a CARRY chain where CYINIT or CIN are not specified will failed. I don't know what yosys changed that the ifdef is no longer working, but the solution is to fix the ifdef, not remove it.

HackerFoo commented 4 years ago

Is there an updated XDC plugin that will work with this version of Yosys? https://github.com/SymbiFlow/yosys-symbiflow-plugins/commit/1c495fd47ddfc54a9f815c0ba97dc112e1731bd6 does not work.

acomodi commented 4 years ago

@litghost All right, I will fix the LCU carry chains and the ifdef.

Specifically the LCU should just use the code at the else block

By this you mean the following else block? https://github.com/SymbiFlow/yosys/blob/3edd17f94d3ec19aab8087fea73179fdb8de5170/techlibs/xilinx/arith_map.v#L134-L162

Or this one? https://github.com/SymbiFlow/yosys/blob/3edd17f94d3ec19aab8087fea73179fdb8de5170/techlibs/xilinx/arith_map.v#L166-L201

If the case is the latter one, I think we would need to add still the _COUT prefix as VPR does not recognize those (or at least add a CARRY4 definition in the cells_map.v

I have found another issue, related to the LOC constraints, which is slightly different from the previous one (which I filed here). In this case, the techmapping pass makes attributes to be lost. I have a reduced test case and the bad commit causing the problem, and used that for another issue upstream.

litghost commented 4 years ago

If the case is the latter one, I think we would need to add still the _COUT prefix as VPR does not recognize those (or at least add a CARRY4 definition in the cells_map.v

The latter one, but your analysis is incorrect. The CARRY4_COUT has no CO port, so it cannot be used. VPR understands neither CARRY4_COUT nor CARRY4, it understands CARRY4_VPR. CARRY4_COUT is mapped to CARRY4_VPR during the symbiflow-arch-defs portion of the yosys flow.

The solution in this case is to add a mapping of CARRY4 to CARRY4_VPR. This works in this instance because the LCU CARRY4 doesn't use both the CO and O ports, only the CO ports.

acomodi commented 4 years ago

Is there an updated XDC plugin that will work with this version of Yosys? SymbiFlow/yosys-symbiflow-plugins@1c495fd does not work.

@HackerFoo Not yet on conda. I have just pushed a fix for building the plugins with the newer version of yosys here: https://github.com/SymbiFlow/yosys-symbiflow-plugins/commit/809a2eac0c3132cfbdd84d7f4069186a1d7905bb

acomodi commented 4 years ago

The solution in this case is to add a mapping of CARRY4 to CARRY4_VPR. This works in this instance because the LCU CARRY4 doesn't use both the CO and O ports, only the CO ports.

All clear, thanks

acomodi commented 4 years ago

@litghost I have been trying to get the CARRY4 in the techmap to have the CARRY4_VPR out of them, the problem though seems to be that there is a routing issue given by the CI input of one carrychain being connected to a DMUX output of the previous carry chain:

## Initializing router criticalities took 0.06 seconds (max_rss 3464.8 MiB, delta_rss +0.0 MiB)
---- ------ ------- ---- ------- ------- ------- ----------------- --------------- -------- ---------- ---------- ---------- ---------- --------
Iter   Time    pres  BBs    Heap  Re-Rtd  Re-Rtd Overused RR Nodes      Wirelength      CPD       sTNS       sWNS       hTNS       hWNS Est Succ
      (sec)     fac Updt    push    Nets   Conns                                       (ns)       (ns)       (ns)       (ns)       (ns)     Iter
---- ------ ------- ---- ------- ------- ------- ----------------- --------------- -------- ---------- ---------- ---------- ---------- --------
Warning 1224: No routing path for connection to sink_rr 813302, retrying with full device bounding box
Cannot route from BLK-TL-CLBLL_L[0].CLBLL_L_DMUX[0] (RR node: 2777187 type: SOURCE location: (72,130) class: 87 capacity: 1) to BLK-TL-CLBLL_L[0].CLBLL_LL_CIN[0] (RR node: 813302 type: SINK location: (72,17) class: 21 capacity: 1) -- no possible path
Failed to route connection from '$ge$/data/symbiflow/symbiflow-arch-defs/build/xc/xc7/tests/soc/litex/base/baselitex_arty.v:3519$52.slice[4].carry4' to '$ge$/data/symbiflow/symbiflow-arch-defs/build/xc/xc7/tests/soc/litex/base/baselitex_arty.v:3519$52.slice[5].carry4' for net '$abc
$303469$aiger303468$8737' (#11248)
Routing failed.
# Routing took 28.51 seconds (max_rss 3464.8 MiB, delta_rss +0.0 MiB)
Circuit is unroutable with a channel width factor of 500.
VPR failed to implement circuit
The entire flow of VPR took 62.31 seconds (max_rss 3464.8 MiB)

I have then also tried to replace CARRY4_COUT instead of CARRY4 with the addition of the CARRY_COUT_PLUG to overcome the issue, but the router had issues in routing as well, with higher runtime per route iteration as well worse CPD.

On a side note, it seems that RAMB36 let VPR reach worse CPD ~18.6 ns vs ~16.8 while using RAMB18 only.

litghost commented 4 years ago

@litghost I have been trying to get the CARRY4 in the techmap to have the CARRY4_VPR out of them, the problem though seems to be that there is a routing issue given by the CI input of one carrychain being connected to a DMUX output of the previous carry chain:

Ya, I was worried about this. CARRY4_VPR has two ports, CO_CHAIN and CO_FABRIC3. The underlying CARRY4 only has CO[3], which connects to both CO_CHAIN and CO_FABRIC3. We should probably make a CARRY4_CO_COUT that is like the CARRY4_COUT.

litghost commented 4 years ago

@acomodi, FYI https://github.com/YosysHQ/yosys/issues/2213

It looks like the conditional compilation directives were removed from yosys during development, unclear what happened between then and now. Hopefully we will get clarity.

litghost commented 4 years ago

@acomodi https://github.com/litghost/symbiflow-arch-defs/commits/new_yosys_with_conda + https://github.com/litghost/yosys/tree/sync_yosys is my latest attempt. I'm running all_xc7 now.

GitHub
litghost/symbiflow-arch-defs
FOSS architecture definitions of FPGA hardware useful for doing PnR device generation. - litghost/symbiflow-arch-defs
GitHub
litghost/yosys
Yosys Open SYnthesis Suite. Contribute to litghost/yosys development by creating an account on GitHub.
acomodi commented 4 years ago

@acomodi, FYI YosysHQ#2213

It looks like the conditional compilation directives were removed from yosys during development, unclear what happened between then and now. Hopefully we will get clarity.

If the issue is getting a different definition if in simulation by defining the various conditions explicitly instead of the |, wouldn't it be ok to use the __ICARUS__ directive? I am unsure how it works and who sets the directive, but it is used here for instance: https://github.com/YosysHQ/yosys/blob/9d658a19708b99288bed2d8c8da3a2fe987a21b3/techlibs/xilinx/cells_sim.v#L289-L303

GitHub
YosysHQ/yosys
Yosys Open SYnthesis Suite. Contribute to YosysHQ/yosys development by creating an account on GitHub.
acomodi commented 4 years ago

@acomodi https://github.com/litghost/symbiflow-arch-defs/commits/new_yosys_with_conda + https://github.com/litghost/yosys/tree/sync_yosys is my latest attempt. I'm running all_xc7 now.

I have tried to run with these branches, but I got an error in VPR when running the baselitex example that let yosys infer LCUs instead of ALUs.

The error is the following:

vpr/src/base/netlist.tpp:1887 associate_pin_with_net: Assertion 'net_pins_[net_id][0] == PinId::INVALI
D()' failed (Must be no existing net driver).
xc/xc7/tests/soc/litex/base/CMakeFiles/file_xc_xc7_tests_soc_litex_base_baselitex_arty_artix7-xc7a50t-virt-xc7a50t-test_top.net.dir/build.make:68: recipe for target 'xc/xc7/tests/soc/litex/base/baselitex_arty/artix7-xc7a50t-virt-xc7a50t-test/top.net' failed
make[3]: *** [xc/xc7/tests/soc/litex/base/baselitex_arty/artix7-xc7a50t-virt-xc7a50t-test/top.net] Error 134
CMakeFiles/Makefile2:239603: recipe for target 'xc/xc7/tests/soc/litex/base/CMakeFiles/file_xc_xc7_tests_soc_litex_base_baselitex_arty_artix7-xc7a50t-virt-xc7a50t-test_top.net.dir/all' failed
make[2]: *** [xc/xc7/tests/soc/litex/base/CMakeFiles/file_xc_xc7_tests_soc_litex_base_baselitex_arty_artix7-xc7a50t-virt-xc7a50t-test_top.net.dir/all] Error 2

What I am unsure of is why cannot we just use the current arith_map.v that treats carry chains in the same way as for the ALU which proved to be working correctly.

Another thing we could do to move forward and get yosys in sync with upstream is to temporarily disable the LCU inference by using the vpr flag in synth_xilinx.cc and open an issue about restoring the LCU in a second moment. I have tested the implementation without LCU and it works on HW. The only problem is that routing takes a longer time to compute.

I will provide some more data on the run-times and CPD with the various configurations in a bit.

GitHub
litghost/symbiflow-arch-defs
FOSS architecture definitions of FPGA hardware useful for doing PnR device generation. - litghost/symbiflow-arch-defs
GitHub
litghost/yosys
Yosys Open SYnthesis Suite. Contribute to litghost/yosys development by creating an account on GitHub.
acomodi commented 4 years ago

Router data with different configurations when running the baselitex test.

LCU RAMB36 CPD (ns) Route time (s) Works on HW
NO NO 18.084 87.75 YES
NO YES 18.766 147.16 YES
YES NO 16.876 101.26 YES
YES YES 18.966 85.25 YES

Note: when RAMB36 is disabled, only RAMB18 is used

acomodi commented 4 years ago

The ABC9 directive was removed here: 1dc22607c38486d9e1a2b56f749d1eca35d405d2

We could temporarily re-add that and have restore the ifdef

litghost commented 4 years ago

What I am unsure of is why cannot we just use the current arith_map.v that treats carry chains in the same way as for the ALU which proved to be working correctly.

Because the O and CO outputs are not the same. You are increasing the logic depth by 1 by using the O outputs instead of the CO outputs. The LCU inference is clearly a win, so let's implement it correct using the CARRY4 hardware.

litghost commented 4 years ago

If the issue is getting a different definition if in simulation by defining the various conditions explicitly instead of the |, wouldn't it be ok to use the __ICARUS__ directive? I am unsure how it works and who sets the directive, but it is used here for instance:

That's what I proposed here: https://github.com/litghost/yosys/blob/e8bfed992e7926c1229229e7e9a2b9f6f537252b/techlibs/xilinx/cells_sim.v#L479-L485

For clarity, __ICARUS__ works because of "Predefined Macros" in https://linux.die.net/man/1/iverilog

GitHub
litghost/yosys
Yosys Open SYnthesis Suite. Contribute to litghost/yosys development by creating an account on GitHub.
litghost commented 4 years ago

Router data with different configurations when running the baselitex test.

It would be good to check the Yosys+Vivado results for the same configurations.

litghost commented 4 years ago

@acomodi I believe we have testbenches for counter-like designs, which test the alu arith_map.v. I strongly believe we need to add a LCU testbench to detect whether the lcu arith_map.v is working. It is not a good idea to need to test in hardware all of LiteX just to verify if the yosys+vpr LCU / ALU logic inference is working as expected.

acomodi commented 4 years ago

@litghost working on that, I am taking the reduced testbenches from yosys that infer lcu, so that we can have much faster iteration steps.

acomodi commented 4 years ago

@litghost FYI https://github.com/SymbiFlow/symbiflow-arch-defs/pull/1564

litghost commented 4 years ago

@acomodi https://github.com/litghost/vtr-verilog-to-routing/tree/make_capnp_backward + https://github.com/litghost/symbiflow-arch-defs/tree/new_yosys_with_conda + https://github.com/litghost/yosys/tree/sync_yosys passed your LCU testbench.

The new VPR is running through CI now over here: https://github.com/SymbiFlow/vtr-verilog-to-routing/pull/525 The new symbiflow-arch-defs is basically https://github.com/SymbiFlow/symbiflow-arch-defs/pull/1558 plus the changes needed for the new yosys.

GitHub
litghost/vtr-verilog-to-routing
Verilog to Routing -- Open Source CAD Flow for FPGA Research - litghost/vtr-verilog-to-routing
GitHub
litghost/symbiflow-arch-defs
FOSS architecture definitions of FPGA hardware useful for doing PnR device generation. - litghost/symbiflow-arch-defs
GitHub
litghost/yosys
Yosys Open SYnthesis Suite. Contribute to litghost/yosys development by creating an account on GitHub.
litghost commented 4 years ago

@litghost FYI SymbiFlow/symbiflow-arch-defs#1564

I wonder if the LCU width is too narrow to exhibit the issue? I'm trying to widen the compare operation and see what happens.

litghost commented 4 years ago

@litghost FYI SymbiFlow/symbiflow-arch-defs#1564

I wonder if the LCU width is too narrow to exhibit the issue? I'm trying to widen the compare operation and see what happens.

Expanding from an 8 bit counter to a 32 bit counter resulted in testbench_synth_lcu_tb failing, I'll see if I can post a fix today.

litghost commented 4 years ago

@acomodi I extended your LCU test in https://github.com/litghost/symbiflow-arch-defs/commit/c2bc596ece222e83f696377f465d7bd251a5d4cb and fixed a bug. Maybe that combo will work on hardware now?

I'm still debugging https://github.com/SymbiFlow/vtr-verilog-to-routing/pull/525, but the problem appears to lies in ROI designs, not ROI-less.

litghost commented 4 years ago

@litghost FYI SymbiFlow/symbiflow-arch-defs#1564

I wonder if the LCU width is too narrow to exhibit the issue? I'm trying to widen the compare operation and see what happens.

Expanding from an 8 bit counter to a 32 bit counter resulted in testbench_synth_lcu_tb failing, I'll see if I can post a fix today.

Latest version of 32 bit test is pushed.

acomodi commented 4 years ago

@litghost, I have run the baslitex build once more with the branches pointed by you, and the implementation works on HW. The only thing though is that we reach a worse CPD than the current baselitex implementation, but it might be due to the usage of RAMB36.

An updated table with the results using the various branches (and the current master+wip VTR) is the following:

LCU RAMB36 CPD (ns) Route time (s) Works on HW
YES YES 18.451 105.60 YES

While the current yosys master+wip achieves this

LCU RAMB36 CPD (ns) Route time (s) Works on HW
YES YES 17.771 270.75 YES

(Unsure why run-time is that high, it should be around 147s, I would need to relaunch the test in perf-tool)

I think there also is an issue with fasm2bels as the diff_fasm reports one fasm line difference related to the DOUTMUX:

--- /data/symbiflow/symbiflow-arch-defs/build/xc/xc7/tests/soc/litex/base/baselitex_arty/artix7-xc7a50t-virt-xc7a50t-test/top.bit.fasm  2020-07-02 10:55:30.114271878 +0200
+++ /data/symbiflow/symbiflow-arch-defs/build/xc/xc7/tests/soc/litex/base/baselitex_arty/artix7-xc7a50t-virt-xc7a50t-test/design_baselitex_arty_vivado.bit.fasm      2020-07-02 11:44:53.814974831 +0200
@@ -15509,7 +15509,7 @@                                           
 CLBLL_L_X28Y89.SLICEL_X1.DFF.ZRST                                  
 CLBLL_L_X28Y89.SLICEL_X1.DFFMUX.O5                                 
 CLBLL_L_X28Y89.SLICEL_X1.DLUT.INIT[63:0] = 64'b1010101010101010101010101010101011111111000000001111000011110000
-CLBLL_L_X28Y89.SLICEL_X1.DOUTMUX.CY                                
+CLBLL_L_X28Y89.SLICEL_X1.DOUTMUX.O6                                 
 CLBLL_L_X28Y89.SLICEL_X1.FFSYNC                                   
 CLBLL_L_X28Y89.SLICEL_X1.NOCLKINV                                   
 CLBLL_L_X28Y89.SLICEL_X1.PRECYINIT.C1 

The vivado-generated bitstream has a non-working ethernet module, therefore this is most probably a fasm2bels issue that was uncaught as CARRY chains used to implement LCUs were never adopted before.

litghost commented 4 years ago

@litghost, I have run the baslitex build once more with the branches pointed by you, and the implementation works on HW. The only thing though is that we reach a worse CPD than the current baselitex implementation, but it might be due to the usage of RAMB36.

That's good. At this point, we should probably build a VPR with the latest upstream master + the update to libblifparse (in a wip branch) + a temporary wip branch https://github.com/litghost/vtr-verilog-to-routing/tree/wip/add_back_disable_flag . I expect that VPR should work with current upstream and work with the new yosys master+wip. We can tease out the CPD stuff later. Looking at 1 circuit isn't enough to determine if the new yosys is better or worse, and the new yosys isn't significantly worse.

GitHub
litghost/vtr-verilog-to-routing
Verilog to Routing -- Open Source CAD Flow for FPGA Research - litghost/vtr-verilog-to-routing
litghost commented 4 years ago

I think there also is an issue with fasm2bels as the diff_fasm reports one fasm line difference related to the DOUTMUX:

--- /data/symbiflow/symbiflow-arch-defs/build/xc/xc7/tests/soc/litex/base/baselitex_arty/artix7-xc7a50t-virt-xc7a50t-test/top.bit.fasm  2020-07-02 10:55:30.114271878 +0200
+++ /data/symbiflow/symbiflow-arch-defs/build/xc/xc7/tests/soc/litex/base/baselitex_arty/artix7-xc7a50t-virt-xc7a50t-test/design_baselitex_arty_vivado.bit.fasm      2020-07-02 11:44:53.814974831 +0200
@@ -15509,7 +15509,7 @@                                           
 CLBLL_L_X28Y89.SLICEL_X1.DFF.ZRST                                  
 CLBLL_L_X28Y89.SLICEL_X1.DFFMUX.O5                                 
 CLBLL_L_X28Y89.SLICEL_X1.DLUT.INIT[63:0] = 64'b1010101010101010101010101010101011111111000000001111000011110000
-CLBLL_L_X28Y89.SLICEL_X1.DOUTMUX.CY                                
+CLBLL_L_X28Y89.SLICEL_X1.DOUTMUX.O6                                 
 CLBLL_L_X28Y89.SLICEL_X1.FFSYNC                                   
 CLBLL_L_X28Y89.SLICEL_X1.NOCLKINV                                   
 CLBLL_L_X28Y89.SLICEL_X1.PRECYINIT.C1 

The vivado-generated bitstream has a non-working ethernet module, therefore this is most probably a fasm2bels issue that was uncaught as CARRY chains used to implement LCUs were never adopted before.

Ya, this appears to be a fasm2bels bug.

litghost commented 4 years ago

Looks like merging this was a mistake, as failing travis tests prevents the package from being pushed.