f4pga / f4pga-arch-defs

FOSS architecture definitions of FPGA hardware useful for doing PnR device generation.
https://f4pga.org
ISC License
273 stars 113 forks source link

OpenTitan support #1442

Open rw1nkler opened 4 years ago

rw1nkler commented 4 years ago

Here is the branch with the current progress on OpenTitan support: https://github.com/antmicro/symbiflow-arch-defs/tree/opentitan_earlgray

Currently, I'm struggling with the error in vpr_io_place.py:

Generating earlgray_nexys_video/artix7_200t-xc7a200t-virt-xc7a200t-test/top_io.place
Traceback (most recent call last):
  File "/home/build/rwinkler/google-symbiflow-arch-defs/xc/common/utils/prjxray_create_ioplace.py", line 159, in <module>
    main()
  File "/home/build/rwinkler/google-symbiflow-arch-defs/xc/common/utils/prjxray_create_ioplace.py", line 73, in main
    io_place.load_block_names_from_net_file(args.net)
  File "/home/build/rwinkler/google-symbiflow-arch-defs/utils/vpr_io_place.py", line 60, in load_block_names_from_net_file
    "//block[@instance='inpad[0]'] | //block[@instance='outpad[0]']"
  File "src/lxml/etree.pyx", line 1581, in lxml.etree._Element.xpath
  File "src/lxml/xpath.pxi", line 305, in lxml.etree.XPathElementEvaluator.__call__
  File "src/lxml/xpath.pxi", line 225, in lxml.etree._XPathEvaluatorBase._handle_result
lxml.etree.XPathEvalError: Error in xpath expression
xc/xc7/tests/soc/earlgray/CMakeFiles/file_xc_xc7_tests_soc_earlgray_earlgray_nexys_video_artix7_200t-xc7a200t-virt-xc7a200t-test_top_io.place.dir/build.make:71: recipe for target 'xc/xc7/tests/soc/earlgray/earlgray_nexys_video/artix7_200t-xc7a200t-virt-xc7a200t-test/top_io.place' failed
make[3]: *** [xc/xc7/tests/soc/earlgray/earlgray_nexys_video/artix7_200t-xc7a200t-virt-xc7a200t-test/top_io.place] Error 1
make[3]: *** Deleting file 'xc/xc7/tests/soc/earlgray/earlgray_nexys_video/artix7_200t-xc7a200t-virt-xc7a200t-test/top_io.place'
CMakeFiles/Makefile2:222236: recipe for target 'xc/xc7/tests/soc/earlgray/CMakeFiles/file_xc_xc7_tests_soc_earlgray_earlgray_nexys_video_artix7_200t-xc7a200t-virt-xc7a200t-test_top_io.place.dir/all' failed
make[2]: *** [xc/xc7/tests/soc/earlgray/CMakeFiles/file_xc_xc7_tests_soc_earlgray_earlgray_nexys_video_artix7_200t-xc7a200t-virt-xc7a200t-test_top_io.place.dir/all] Error 2
CMakeFiles/Makefile2:222935: recipe for target 'xc/xc7/tests/soc/earlgray/CMakeFiles/earlgray_nexys_video_bit.dir/rule' failed
make[1]: *** [xc/xc7/tests/soc/earlgray/CMakeFiles/earlgray_nexys_video_bit.dir/rule] Error 2
Makefile:2933: recipe for target 'xc/xc7/tests/soc/earlgray/CMakeFiles/earlgray_nexys_video_bit.dir/rule' failed
make: *** [xc/xc7/tests/soc/earlgray/CMakeFiles/earlgray_nexys_video_bit.dir/rule] Error 2

However, the same XPath expression works for other designs like button or counter on nexys video. I think that the error is in the wrong .net or .eblif file itself. Maybe due to unsupported elements of the design or wrong import in previous importing scripts.

mkurc-ant commented 4 years ago

I've tried to lint the top.net xml using xmllint and it succeeded. Next, I've tried to query the XPath (part of it) and this is what i got:

> xmllint --huge --xpath //block[@instance='inpad[0]'] top.net
XPath error : Memory allocation failed : growing nodeset hit limit

growing nodeset hit limit

^
XPath evaluation failure

It seems that the libxml has hard time using XPath for such a huge XML file (~330MB). The python wrapper simply misinterprets the error and throws the lxml.etree.XPathEvalError exception even though the XPath itself is correct.

The xmllint I used is built with:

xmllint: using libxml version 20904
   compiled with: Threads Tree Output Push Reader Patterns Writer SAXv1 FTP HTTP DTDValid HTML Legacy C14N Catalog XPath XPointer XInclude Iconv ICU ISO8859X Unicode Regexps Automata Expr Schemas Schematron Modules Debug Zlib Lzma

I need to double check that against the version of libxml that we use in SymbiFlow.

@rw1nkler For now try applying this patch to utils/vpr_io_place.py, this one works by querying all blocks and filtering them in python:

diff --git a/utils/vpr_io_place.py b/utils/vpr_io_place.py
index c31fbd7d..5da0e2d8 100644
--- a/utils/vpr_io_place.py
+++ b/utils/vpr_io_place.py
@@ -56,9 +56,11 @@ class IoPlace(object):
         net_root = net_xml.getroot()
         self.net_to_block = {}

-        for block in net_root.xpath(
-                "//block[@instance='inpad[0]'] | //block[@instance='outpad[0]']"
-        ):
+        for block in net_root.xpath("//block"):
+            instance = block.attrib["instance"]
+            if instance != "inpad[0]" and instance != "outpad[0]":
+                continue
+
             top_block = block.getparent()
             assert top_block is not None
             while top_block.getparent() is not net_root:
rw1nkler commented 4 years ago

Utilization report

Here you can find vivado utilization report: top_earlgrey_nexysvideo_utilization_placed.log

OpenTitan burndown list

Here is the short burndown list for OpenTitan:

It will be good to optimize lookahead generation. Currently, it requires more than 10h to complete the computation. This slows down the working process since every change in architecture or in the tools requires lookahead recomputing.

Changes in the OpenTitan design

Here is the list of changes introduced for OpenTitan:

  1. Due to the fact that tristate buffers are not inferred correctly when instantiated in submodules, the top module was changed. (https://github.com/YosysHQ/yosys/issues/1737). Here you can find the changed file:

  2. Because of bad RAM36 inference (https://github.com/YosysHQ/yosys/issues/1748), the appropriate workaround was created. It consists of two files:

  3. One hmac.sv file after sv2v conversion produced the following error in Yosys: ERROR: 2nd expression of generate for-loop is not constant!. Due to that, it was changed:

  4. Some parts of Verilog syntax are not supported by Yosys, due to this fact the following commands were used to all the files:

if [ "$1" = "prim_lfsr.v" ]; then
    sed -i 's/sv2v_cast_64\((["A-Za-z0-9_]*)\)/\1/g' $1
fi

sed -i 's/parameter unsigned/parameter/g' $1   // $1 is a Verilog file - output of sv2v
sed -i 's/localparam unsigned/localparam/g' $1 
sed -i 's/if (.*) ;//g' $1 
sed -i 's/(strong0, strong1)//g' $1
sed -i 's/(pull0, pull1)//g' $1
sed -i 's/(highz0, weak1)//g' $1 
sed -i 's/(weak0, highz1)//g' $1
sed -i 's/(weak0, weak1)//g' $1

Soon I will update the issue with the diff file with all the introduced changes.

Current work

Currently, we are in the step of .route file generation. This step consumes a lot of RAM, and runners with 126Gb of memory were unable to handle it. After introducing the --congested_routing_iteration_threshold 0.08 flag to the routing process it seems that the routing step does not exceed available resources.

rw1nkler commented 4 years ago

Currently, the OpenTitan design can be routed with the changes introduced in:

However, the routing process consumes about 90Gb of RAM.

The SymbiFlow failes on the frames step with the following error:

  File "/home/build/rwinkler/symbiflow-arch-defs/third_party/prjxray/utils/fasm2frames.py", line 306, in <module>
    main()
  File "/home/build/rwinkler/symbiflow-arch-defs/third_party/prjxray/utils/fasm2frames.py", line 302, in main
    emit_pudc_b_pullup=args.emit_pudc_b_pullup)
  File "/home/build/rwinkler/symbiflow-arch-defs/third_party/prjxray/utils/fasm2frames.py", line 179, in run
    assembler.parse_fasm_filename(filename_in, extra_features=extra_features)
  File "/home/build/rwinkler/symbiflow-arch-defs/third_party/prjxray/prjxray/fasm_assembler.py", line 180, in parse_fasm_filename
    raise FasmLookupError('\n'.join(missing_features))
prjxray.fasm_assembler.FasmLookupError: Segment DB GTP_COMMON_MID_RIGHT, key GTP_COMMON_MID_RIGHT.HCLK_GTP_CK_IN13.HCLK_GTP_CK_MUX13 not found from line 'GTP_COMMON_MID_RIGHT_X167Y23.HCLK_GTP_CK_IN13.HCLK_GTP_CK_MUX13'
Segment DB CLK_HROW_BOT_R, key CLK_HROW_BOT_R.CLK_HROW_BOT_R_CK_BUFG_CASCO22.CLK_HROW_BOT_R_CK_BUFG_CASCIN22 not found from line 'CLK_HROW_BOT_R_X139Y78.CLK_HROW_BOT_R_CK_BUFG_CASCO22.CLK_HROW_BOT_R_CK_BUFG_CASCIN22'
Segment DB CLK_HROW_BOT_R, key CLK_HROW_BOT_R.CLK_HROW_BOT_R_CK_BUFG_CASCO22.CLK_HROW_BOT_R_CK_BUFG_CASCIN22 not found from line 'CLK_HROW_BOT_R_X139Y130.CLK_HROW_BOT_R_CK_BUFG_CASCO22.CLK_HROW_BOT_R_CK_BUFG_CASCIN22'
Segment DB GTP_COMMON_MID_RIGHT, key GTP_COMMON_MID_RIGHT.HCLK_GTP_CK_IN10.HCLK_GTP_CK_MUX10 not found from line 'GTP_COMMON_MID_RIGHT_X167Y23.HCLK_GTP_CK_IN10.HCLK_GTP_CK_MUX10'
Segment DB CLK_HROW_BOT_R, key CLK_HROW_BOT_R.CLK_HROW_BOT_R_CK_BUFG_CASCO20.CLK_HROW_BOT_R_CK_BUFG_CASCIN20 not found from line 'CLK_HROW_BOT_R_X139Y78.CLK_HROW_BOT_R_CK_BUFG_CASCO20.CLK_HROW_BOT_R_CK_BUFG_CASCIN20'
Segment DB CLK_HROW_BOT_R, key CLK_HROW_BOT_R.CLK_HROW_BOT_R_CK_BUFG_CASCO20.CLK_HROW_BOT_R_CK_BUFG_CASCIN20 not found from line 'CLK_HROW_BOT_R_X139Y130.CLK_HROW_BOT_R_CK_BUFG_CASCO20.CLK_HROW_BOT_R_CK_BUFG_CASCIN20'
Segment DB GTP_COMMON_MID_RIGHT, key GTP_COMMON_MID_RIGHT.HCLK_GTP_CK_IN12.HCLK_GTP_CK_MUX12 not found from line 'GTP_COMMON_MID_RIGHT_X167Y23.HCLK_GTP_CK_IN12.HCLK_GTP_CK_MUX12'
Segment DB CLK_HROW_BOT_R, key CLK_HROW_BOT_R.CLK_HROW_BOT_R_CK_BUFG_CASCO2.CLK_HROW_BOT_R_CK_BUFG_CASCIN2 not found from line 'CLK_HROW_BOT_R_X139Y78.CLK_HROW_BOT_R_CK_BUFG_CASCO2.CLK_HROW_BOT_R_CK_BUFG_CASCIN2'
Segment DB CLK_HROW_BOT_R, key CLK_HROW_BOT_R.CLK_HROW_BOT_R_CK_BUFG_CASCO2.CLK_HROW_BOT_R_CK_BUFG_CASCIN2 not found from line 'CLK_HROW_BOT_R_X139Y130.CLK_HROW_BOT_R_CK_BUFG_CASCO2.CLK_HROW_BOT_R_CK_BUFG_CASCIN2'
xc/xc7/tests/soc/earlgray/CMakeFiles/earlgray_nexys_video_bin.dir/build.make:66: recipe for target 'xc/xc7/tests/soc/earlgray/earlgray_nexys_video/artix7_200t-xc7a200t-virt-xc7a200t-test/top.frames' failed
tcal-x commented 4 years ago

I encountered the same error in libxml described above, "Error in xpath expression", at the same place in vpr_io_place.py. It was a large design -- a version of scalable_proc with N=100.

I ran into the same xml issue also in vpr_place_constraints.py at line 48: for attr in net_root.xpath("//attribute[@name='LOC']"):

rw1nkler commented 4 years ago

Routing issues

After reducing the design frequency to 25MHz, I was able to generate bitstream of EarlGrey design that works on hardware. I used old symbiflow-arch-defs version because the routing process on the current master fails due to huge RAM consumption that exceeds 128GB.

The bitstream was successfully generated using the symbiflow-arch-defs in version (f8aa52f3) with conda VPR. I figured out that there are small differences between locally build VPR and the conda one, but the final routing parameters do not change dramatically - about +2ns for critical path delay. I guess that this is normal behavior.

Here is the table with information about the router iterations:

 ---- ------ ------- ---- ------- ------- ------- ----------------- --------------- -------- ---------- ---------- ---------- ---------- --------
 Iter   Time    pres  BBs    Heap  Re-Rtd  Re-Rtd Overused RR Nodes      Wirelength      CPD       sTNS       sWNS       hTNS       hWNS Est Succ
       (sec)     fac Updt    push    Nets   Conns                                       (ns)       (ns)       (ns)       (ns)       (ns)     Iter
 ---- ------ ------- ---- ------- ------- ------- ----------------- --------------- -------- ---------- ---------- ---------- ---------- --------
    1  625.8     0.0    0 4.0e+09   80416  292570  184703 ( 1.535%) 5882047 (22.8%)   16.386 -2.546e+04    -16.386      0.000      0.000      N/A
    2  162.1     4.0   80 7.1e+08   45992  224593   31777 ( 0.264%) 6793439 (26.3%)   16.355 -1.091e+04    -16.355      0.000      0.000      N/A
    3   80.3     5.2  315 3.3e+08   19372  102002   16537 ( 0.137%) 7068850 (27.4%)   16.488     -9147.    -16.488      0.000      0.000      N/A
    4   62.0     6.8  412 2.5e+08   10355   62812    7694 ( 0.064%) 7266967 (28.2%)   16.440     -9295.    -16.440      0.000      0.000      N/A
    5   38.9     8.8  427 1.5e+08    5599   33620    3621 ( 0.030%) 7402195 (28.7%)   16.488     -9200.    -16.488      0.000      0.000      N/A
    6   29.0    11.4  349 1.1e+08    3008   16253    1678 ( 0.014%) 7492661 (29.0%)   16.486     -9268.    -16.486      0.000      0.000      N/A
    7   27.4    14.9  245 9.9e+07    1551    7943     724 ( 0.006%) 7550901 (29.3%)   16.492     -9245.    -16.492      0.000      0.000      N/A
    8   31.9    19.3  149 1.1e+08     783    3327     367 ( 0.003%) 7584782 (29.4%)   16.584     -9258.    -16.584      0.000      0.000      N/A
    9   52.9    25.1   81 1.7e+08     467    1415     197 ( 0.002%) 7602250 (29.5%)   16.584     -9259.    -16.584      0.000      0.000      N/A
   10   58.3    32.6   50 1.8e+08     273     826     121 ( 0.001%) 7614198 (29.5%)   16.584     -9259.    -16.584      0.000      0.000       16
   11   80.2    42.4   33 2.3e+08     185     516      81 ( 0.001%) 7620016 (29.5%)   16.584     -9259.    -16.584      0.000      0.000       17
   12   80.7    55.1   17 2.2e+08     123     307      56 ( 0.000%) 7623996 (29.6%)   16.584     -9259.    -16.584      0.000      0.000       18
   13   77.0    71.7   13 2.0e+08      86     225      35 ( 0.000%) 7626631 (29.6%)   16.584     -9259.    -16.584      0.000      0.000       20
   14   50.5    93.2    9 1.3e+08      55     137      26 ( 0.000%) 7628556 (29.6%)   16.584     -9260.    -16.584      0.000      0.000       20
   15   36.9   121.1    4 1.0e+08      40      85      17 ( 0.000%) 7629406 (29.6%)   16.584     -9260.    -16.584      0.000      0.000       21
   16   30.1   157.5    4 7.8e+07      29      60      13 ( 0.000%) 7629943 (29.6%)   16.584     -9260.    -16.584      0.000      0.000       22
   17   18.0   204.7    3 4.5e+07      19      43      11 ( 0.000%) 7630729 (29.6%)   16.584     -9260.    -16.584      0.000      0.000       22
   18   26.7   266.2    3 6.6e+07      19      23      11 ( 0.000%) 7631235 (29.6%)   16.584     -9260.    -16.584      0.000      0.000       23
   19   28.3   346.0    2 7.3e+07      22      57      11 ( 0.000%) 7632287 (29.6%)   16.584     -9260.    -16.584      0.000      0.000       24
   20   26.5   449.8    3 6.4e+07      18      25       8 ( 0.000%) 7632506 (29.6%)   16.584     -9260.    -16.584      0.000      0.000       26
   21   19.2   584.8    0 4.5e+07      12      13       6 ( 0.000%) 7632692 (29.6%)   16.584     -9260.    -16.584      0.000      0.000       28
   22   18.6   760.2    0 4.3e+07      11      24       3 ( 0.000%) 7633101 (29.6%)   16.584     -9260.    -16.584      0.000      0.000       28
   23    6.8   988.3    0 1.5e+07       5       8       3 ( 0.000%) 7633239 (29.6%)   16.584     -9260.    -16.584      0.000      0.000       28
   24    6.8  1284.7    0 1.6e+07       6      22       2 ( 0.000%) 7633457 (29.6%)   16.584     -9260.    -16.584      0.000      0.000       28
   25   24.4  1670.2    1 5.1e+07       3       5       1 ( 0.000%) 7633469 (29.6%)   16.584     -9260.    -16.584      0.000      0.000       28
   26    2.2  2171.2    0 2110351       3       8       1 ( 0.000%) 7633490 (29.6%)   16.584     -9260.    -16.584      0.000      0.000       27
   27    1.7  2822.6    0  489410       1       4       1 ( 0.000%) 7633456 (29.6%)   16.584     -9260.    -16.584      0.000      0.000       27
   28    2.0  3669.3    0 1205901       2       5       1 ( 0.000%) 7633454 (29.6%)   16.584     -9260.    -16.584      0.000      0.000       27
   29    1.9  4770.1    1  864661       3       9       1 ( 0.000%) 7633336 (29.6%)   16.584     -9260.    -16.584      0.000      0.000       27
   30    1.8  6201.2    2  876726       2       5       1 ( 0.000%) 7634067 (29.6%)   16.584     -9260.    -16.584      0.000      0.000       27
   31    1.6  8061.5    1  117971       2       5       0 ( 0.000%) 7634493 (29.6%)   16.584     -9260.    -16.584      0.000      0.000       28

I wanted to rebase the code on the current master but the routing parameters are much worse. Here is the routing table for the design rebased on (5267e78e4b4aa54de88fd9aed168b76ddcc28319). The most significant change is the routing quality is visible in the critical path delay which increases by ~110ns. Additionally, the whole routing process takes ages and consumes ~300GB of RAM. I killed the process since the test was taking too much time.

---- ------ ------- ---- ------- ------- ------- ----------------- --------------- -------- ---------- ---------- ---------- ---------- --------
Iter   Time    pres  BBs    Heap  Re-Rtd  Re-Rtd Overused RR Nodes      Wirelength      CPD       sTNS       sWNS       hTNS       hWNS Est Succ
      (sec)     fac Updt    push    Nets   Conns                                       (ns)       (ns)       (ns)       (ns)       (ns)     Iter
---- ------ ------- ---- ------- ------- ------- ----------------- --------------- -------- ---------- ---------- ---------- ---------- --------
Warning 97: 11 timing startpoints were not constrained during timing analysis
Warning 98: 8489 timing endpoints were not constrained during timing analysis
   1  341.3     0.0    0 2.5e+09   80416  291798  147808 ( 1.483%) 3489731 (16.5%)   62.581 -2.922e+04    -22.581      0.000      0.000      N/A
   2  128.8     4.0   16 6.7e+08   49697  226227   34139 ( 0.342%) 4322130 (20.4%)   14.084 -2.572e+04    -14.084      0.000      0.000      N/A
   3  211.3     5.2  401 7.8e+08   24112  103231   21212 ( 0.213%) 4712413 (22.2%)   14.201 -2.676e+04    -14.201      0.000      0.000      N/A
   4  212.1     6.8  851 7.4e+08   13565   70780   13336 ( 0.134%) 4973469 (23.4%)   54.526 -3.064e+04    -14.526      0.000      0.000      N/A
   5  229.9     8.8 1025 7.6e+08    8348   42946    8678 ( 0.087%) 5162286 (24.3%)   54.276 -2.954e+04    -14.276      0.000      0.000      N/A
   6  268.4    11.4  977 8.7e+08    5635   29016    6307 ( 0.063%) 5286737 (24.9%)   54.461 -3.151e+04    -14.461      0.000      0.000      N/A
   7  294.7    14.9  911 9.6e+08    4322   20361    5089 ( 0.051%) 5383685 (25.4%)   55.064 -3.543e+04    -15.064      0.000      0.000      N/A
   8  351.9    19.3  845 1.2e+09    3687   16587    4417 ( 0.044%) 5452563 (25.7%)   54.950 -3.507e+04    -14.950      0.000      0.000      N/A
   9  420.1    25.1  813 1.4e+09    3225   13653    3920 ( 0.039%) 5499245 (25.9%)   54.944 -3.587e+04    -14.944      0.000      0.000      N/A
  10  493.2    32.6  734 1.6e+09    3020   12790    3645 ( 0.037%) 5528542 (26.1%)   55.227 -3.772e+04    -15.227      0.000      0.000       51
  11  631.2    42.4  654 2.1e+09    2866   12018    3509 ( 0.035%) 5553931 (26.2%)   63.598 -3.696e+04    -23.598      0.000      0.000       70
  12  792.7    55.1  597 2.5e+09    2783   12261    3364 ( 0.034%) 5570195 (26.3%)   62.624 -4.787e+04    -22.624      0.000      0.000       81
  13  891.9    71.7  532 2.8e+09    2725   11482    3246 ( 0.033%) 5590442 (26.4%)   56.762 -5.018e+04    -16.762      0.000      0.000      112
  14 1048.9    93.2  475 3.2e+09    2684   11101    3116 ( 0.031%) 5607881 (26.4%)   56.764 -5.048e+04    -16.764      0.000      0.000      125
  15 1146.3   121.1  448 3.5e+09    2593   10724    2941 ( 0.030%) 5616346 (26.5%)   57.501 -5.612e+04    -17.501      0.000      0.000      163
  16 1105.0   157.5  383 3.3e+09    2528   10286    2838 ( 0.028%) 5628742 (26.5%)   74.827 -7.782e+04    -34.827      0.000      0.000      168
  17 1234.6   204.7  370 3.6e+09    2441    9840    2757 ( 0.028%) 5637420 (26.6%)   77.265 -8.728e+04    -37.265      0.000      0.000      195
  18 1313.0   266.2  311 3.7e+09    2368    9956    2607 ( 0.026%) 5654109 (26.7%)   76.618 -8.825e+04    -36.618      0.000      0.000      200
  19 1187.2   346.0  311 3.3e+09    2301    9342    2557 ( 0.026%) 5663623 (26.7%)   76.508 -7.552e+04    -36.508      0.000      0.000      207
  20 1252.5   449.8  275 3.4e+09    2232    9579    2443 ( 0.025%) 5672057 (26.7%)   76.497 -8.485e+04    -36.497      0.000      0.000      211
  21 1100.4   584.8  280 3.2e+09    2220    9196    2380 ( 0.024%) 5682202 (26.8%)   79.182 -1.202e+05    -39.182      0.000      0.000      213
  22 1070.8   760.2  272 3.1e+09    2133    8989    2312 ( 0.023%) 5688693 (26.8%)   78.839 -1.089e+05    -38.839      0.000      0.000      217
  23 1126.7   988.3  251 3.1e+09    2102    8939    2247 ( 0.023%) 5695262 (26.8%)   80.246 -1.125e+05    -40.246      0.000      0.000      224
  24 1172.9  1284.7  254 3.1e+09    2076    9291    2154 ( 0.022%) 5702724 (26.9%)   80.197 -1.166e+05    -40.197      0.000      0.000      229
  25 1241.0  1670.2  217 3.3e+09    1985    8669    2142 ( 0.021%) 5708695 (26.9%)   80.166 -1.255e+05    -40.166      0.000      0.000      235
  26 1219.8  2171.2  195 3.2e+09    1951    8247    2052 ( 0.021%) 5719461 (27.0%)   80.682 -1.153e+05    -40.682      0.000      0.000      243
  27 1236.4  2822.6  189 3.2e+09    1897    7981    1964 ( 0.020%) 5728668 (27.0%)   80.364 -1.161e+05    -40.364      0.000      0.000      252
  28 1053.3  3669.3  197 2.8e+09    1856    8457    1825 ( 0.018%) 5730685 (27.0%)   81.802 -1.430e+05    -41.802      0.000      0.000      253
  29 1197.1  4770.1  165 3.0e+09    1798    7333    1792 ( 0.018%) 5732464 (27.0%)   83.031 -1.349e+05    -43.031      0.000      0.000      251
  30 1191.9  6201.2  171 3.1e+09    1726    6990    1666 ( 0.017%) 5740511 (27.1%)   83.268 -1.448e+05    -43.268      0.000      0.000      248
  31 1167.0  8061.5  165 3.2e+09    1633    6723    1556 ( 0.016%) 5750366 (27.1%)   83.577 -1.531e+05    -43.577      0.000      0.000      240
  32 1659.3 10480.0  153 5.6e+09    1582    6512    1511 ( 0.015%) 5765446 (27.2%)   87.800 -2.035e+05    -47.800      0.000      0.000      230
  33 1681.5 13624.0  143 5.5e+09    1554    6609    1517 ( 0.015%) 5774614 (27.2%)   97.856 -2.585e+05    -57.856      0.000      0.000      223
  34 1609.2 17711.2  148 5.4e+09    1564    7665    1446 ( 0.015%) 5786245 (27.3%)  101.290 -2.959e+05    -61.290      0.000      0.000      224
  35 1554.1 23024.5  143 5.3e+09    1442    6530    1422 ( 0.014%) 5795067 (27.3%)   98.063 -2.556e+05    -58.063      0.000      0.000      222
  36 1496.0 29931.8  133 5.1e+09    1450    6499    1288 ( 0.013%) 5805248 (27.4%)   99.755 -2.759e+05    -59.755      0.000      0.000      224
  37 1572.0 38911.4  127 5.2e+09    1369    6532    1292 ( 0.013%) 5818249 (27.4%)  120.161 -4.244e+05    -80.161      0.000      0.000      217
  38 1509.5 50584.8  125 5.0e+09    1361    5672    1310 ( 0.013%) 5824071 (27.5%)  120.299 -4.284e+05    -80.299      0.000      0.000      218
  39 1405.9 65760.3  105 4.5e+09    1296    5914    1262 ( 0.013%) 5826856 (27.5%)  123.046 -4.249e+05    -83.046      0.000      0.000      221
  40 1328.5 85488.3  109 4.4e+09    1341    5762    1253 ( 0.013%) 5834188 (27.5%)  120.351 -4.336e+05    -80.351      0.000      0.000      225
  41 2028.9 1.1e+05  121 6.6e+09    1227    5896     949 ( 0.010%) 5892416 (27.8%)  126.284 -4.399e+05    -86.284      0.000      0.000      229
  42 1649.2 1.4e+05  376 5.3e+09     939    4597     624 ( 0.006%) 5914571 (27.9%)  134.363 -5.437e+05    -94.363      0.000      0.000      217
  43 1272.0 1.9e+05  215 4.1e+09     724    3092     498 ( 0.005%) 5930555 (28.0%)  127.906 -4.679e+05    -87.906      0.000      0.000      189
  44  997.2 2.4e+05  143 3.3e+09     578    2462     380 ( 0.004%) 5940883 (28.0%)  127.997 -4.746e+05    -87.997      0.000      0.000      167
  45  738.9 3.2e+05   90 2.4e+09     483    1809     305 ( 0.003%) 5954043 (28.1%)  127.997 -4.746e+05    -87.997      0.000      0.000      146

Now I am trying to find the commit that breaks the routing process. I'm using git bisect for that.

Additional information

Placement problems

Rebasing on the (a1f0153f0ae0b61c91ffa0ec4ce763904b32b0e5) causes the problem with the design placement:

Assertion 'cluster_ctx.clb_nlist.net_sinks(curr_net_id).size() == 1' failed.
Aborted

The mentioned commit contains a new Yosys version introduced in https://github.com/SymbiFlow/symbiflow-arch-defs/commit/206bd5657b5882647c2952dbaf118a0e9f6672f. That is the only change in the flow since (5267e78e4b4aa54de88fd9aed168b76ddcc28319) passes the placement step. @acomodi told me that this should be resolved by https://github.com/SymbiFlow/symbiflow-arch-defs/pull/1612.

RAM usage

All VPR runs for EarlGrey design consumes a lot of RAM memory. Part of that is related to the size of the routing graph:

# Create Device took 32.56 seconds (max_rss 78276.3 MiB, delta_rss +0.0 MiB)

Nevertheless, a huge amount of RAM (~80GB) and time (~43min) is used for the Clean circuit step:

# Clean circuit took 2521.69 seconds (max_rss 78260.4 MiB, delta_rss +77435.4 MiB)

This has a huge impact on the time required for the tests since the clean circuit step is executed in every VPR run: clean I believe that all four peaks came from the clean circuit step in VPR.

rw1nkler commented 4 years ago

FYI @litghost

rw1nkler commented 4 years ago

The source code used for the bitstream generation is available in https://github.com/SymbiFlow/symbiflow-arch-defs/pull/1590. Previous symbiflow-arch-defs versions require also:

and the CLK_HROW_BOT_R bits obtained in: https://github.com/SymbiFlow/prjxray/pull/1381

Once the https://github.com/SymbiFlow/symbiflow-arch-defs/pull/1419 is accepted, all the required modifications for the EarlGrey bitstream generation will be on the master branch. The only problem now is the routing issue described above.

litghost commented 4 years ago

I have an initial fix for the RAM explosion in https://github.com/verilog-to-routing/vtr-verilog-to-routing/pull/1454 , which will hopefully be merged upstream. However, the RAM explosion usuallly accompanies an enormous runtime. Still working on that

litghost commented 4 years ago

@rw1nkler If you run with https://github.com/SymbiFlow/symbiflow-arch-defs/pull/1612 rebased on master, does it complete?

rw1nkler commented 4 years ago

Yes, It works after applying the mentioned changes. Also, the routing is successful. I rebased your changes on (f0e7b4212544e1d55da776fb7a2ff79117e01454)

rw1nkler commented 4 years ago

The diff_fasm step is currently not passing. One issue related to PLLs has been resolved: https://github.com/SymbiFlow/symbiflow-xc-fasm2bels/pull/17

The second issue is related to wrong constant nets handling in BUFGMUX in Yosys. The error is caused by manual changes to the code and can be fixed. I’m waiting for the bitstream generation here.

HackerFoo commented 4 years ago

@rw1nkler Any update on this?

rw1nkler commented 4 years ago

I believe that the CARRY4 modeling needs to be merged before submitting the PR with the OpenTitan test. As I said before, the test worked after rebasing the changes on the (previous) master. Since @litghost is working on CARRY4 modeling, I believe that soon the example will be added in the same way as the current Ibex test

litghost commented 4 years ago

For a concrete progress update on the CARRY4 fixes:

  1. Is required to be cleaned up and green and merged https://github.com/SymbiFlow/symbiflow-arch-defs/pull/1673
  2. https://github.com/SymbiFlow/symbiflow-arch-defs/pull/1674 needs to go green
  3. https://github.com/SymbiFlow/symbiflow-arch-defs/pull/1674 needs to be split into smaller more focused PRs around the various fixes.

Once https://github.com/SymbiFlow/symbiflow-arch-defs/pull/1674 is green, I believe an OpenTitan PR could be added. I'm working on iterating #1673 and #1674 and fixes issues as they are found.

litghost commented 4 years ago

1673 has been merged, and #1674 is now green. I've split off some changes from #1674 into #1687 #1688 and #1689. Once those are green and merged, I'll rebase #1674 and address some review comments, and it should be good to merge.