Running eqy can final report error "Argument list too long"

openroadie commented 1 year ago

Problem 1:

For the two provided designs the runtimes are as follows:

Design  Instance count    Runtime (simple)   Runtime (sby)
aes         32k                37m             257m
jpeg        55k               140m             900m

It is not clear if these run times are expected or is there something incorrect with the way we are using eqy.

Problem 2:

At the end of all the four runs (2 designs/2 different strategies) an error is encountered before creating the final report. The logfile for one run is available in aes/logfile.txt. Here is the last few lines of the logfile:

EQY 15:57:04 [aes] run: Running strategy 'sby' on 'aes_cipher_top.wire389.Y'..
EQY 15:57:05 [aes] run: Proved equivalence of partition 'aes_cipher_top.wire389.Y' using strategy 'sby'
EQY 15:57:05 [aes] run: make -f strategies.mk summary
EQY 15:57:05 [aes] run: make[1]: Entering directory '/home/harsh/bugs/eqy/aes'
EQY 15:57:05 [aes] run: make[1]: execvp: /bin/sh: Argument list too long
EQY 15:57:05 [aes] run: make[1]: *** [strategies.mk:825391: summary] Error 127
EQY 15:57:05 [aes] run: make[1]: Leaving directory '/home/harsh/bugs/eqy/aes'
EQY 15:57:05 [aes] run: make: *** [strategies.mk:825389: all] Error 2
EQY 15:57:05 [aes] run: make: Leaving directory '/home/harsh/bugs/eqy/aes'
EQY 15:57:05 [aes] run: finished (returncode=2)

Testcase data is public. The designs are here. And the library data is here

openroadie commented 1 year ago

Here are the eqy files + verilog files etc for the two designs mentioned above.

bug.tar.gz

jix commented 1 year ago

It looks like what's happening here is that when testing a design against an exact copy of itself, the default partitioning strategy that eqy currently uses generates many very small partitions, which is not optimal for performance.

It is possible to influence the partitioning using the configuration file using eqy configuration options/commands and/or yosys commands.

From the design stats, it seems that the design is using a lot of instances of primitives containing very little logic. In that case such primitives should be flattened before partitioning so that the partitioning actually sees the contained logic, making that logic available for larger partitions. (EDIT: This would happen automatically when those primitives are used only on the gate side but not on the gold side, which is the usual setup for pre- vs post synth equality checking.) This can be done by adding the -flatten option to prep or by running flatten after prep.

I also noticed that the design has a lot of auto generated names of the form _<number>_ that would not be matched for an actual pre- vs post-synth check as they are not part of the input RTL. After flattening we are left with signals having names matching _*_.*, and we can exclude them as match candidates by adding the following lines to the eqy config:

[match *]
gate-nomatch _*_.*

The signal matching can be inspected by looking at the matched.ids file eqy generates.

If I do these two changes, the runtime for aes with all partitions solved by the simple strategy is one minute on my machine and doesn't hit the argument list too long error (which we'll fix anyway though, in case a design actually requires that many partitions).

For jpeg and the same kind of changes, it finishes in half a minute.

It is expected that for very simple equality checks the simple use sat strategy is faster, as it has a much lower startup overhead per partition. For equality checks where the actual solving time dominates use sby will often be faster. Additionally use sby allows using engines (e.g. abc pdr) that are strictly more powerful than the method sat uses. Thus it's usually a good idea to configure use sat as first strategy to try followed by use sby as fallback.

openroadie commented 1 year ago

Thank you for the instantaneous analysis @jix. Let me try this out. I have changed the title of the bug to reflect the current state (which is to only address the argument list too long issue).

openroadie commented 1 year ago

@jix to give you some more context, we are would like to use equivalence checking as part of our flow to verify our timing optimization engine (insert/upsize buffers, inverters, clone gates, swap pins etc). So our input and output will always be a synthesized verilog netlist.

jix commented 1 year ago

That should pose no problem with a suitable configuration, it's just not quite what EQY's defaults assume.

Note that if your optimizations can change the actual values of signals but keep their original names, it might be necessary to also exclude those names from getting matched or you can end up with EQY reporting non-equivalent partitions (that issue can also occur for pre- vs post-synthesis checks). The most robust way is to ensure that such a tool renames signals when it changes what they represent e.g. by adding a suffix, but if you can get a list of such signals from an optimization logfile that could be used to automatically generate an EQY config that takes care of excluding those signals from matching. In case such optimizations are only done to signals that do not have a user given name and there is a way to exclude all auto-generated names like in your example here, it also shouldn't be an issue.

EQY already generates some data to help manually identifying such mismatches and we also have plans for adding more automation here.

In any case, let me know if you run into any further issues with integrating EQY into your flow. Even if you run into performance issues that can't be solved by a config change, it's good to know about such cases to have benchmarks that show where improvements are needed.

openroadie commented 1 year ago

Thank you so very much for all the helpful tips. This will help a lot with making our resizer/optimizer code better/more robust.

openroadie commented 1 year ago

@jix , hopefully a very quick question. I am getting a failure to prove equivalence if I swap the input signals on a 2 input NAND gate. Here is the netlist

before:

 DFF_X1 drvr_1 (.D(data),    .CK(clk1),    .Q(clk_to_nand0));
 DFF_X1 drvr_2 (.D(data),    .CK(clk1),    .Q(clk_to_nand1));
 NAND2_X4 nand_inst_0 (.A1(clk_to_nand0),    .A2(clk_to_nand1),    .ZN(net0));

After:

 DFF_X2 drvr_1 (.D(data),    .CK(clk1),    .Q(clk_to_nand0));
 DFF_X2 drvr_2 (.D(data),    .CK(clk1),    .Q(clk_to_nand1));
 NAND2_X4 nand_inst_0 (.A1(clk_to_nand1),    .A2(clk_to_nand0),    .ZN(net0));

How can I get these to match within eqy?

Thx !

jix commented 1 year ago

@jix , hopefully a very quick question. I am getting a failure to prove equivalence if I swap the input signals on a 2 input NAND gate. Here is the netlist

I can't really tell what's going on without more context like the cell definitions and the eqy config used, but after flattening the cells, EQY shouldn't have any problem with this. Without flattening the cells into the design (which only happens automatically if EQY doesn't find matching instances in gate/gold) EQY will treat the cell as a blackbox and can't tell that swapping inputs is equivalent. So if you're comparing two netlists, you have to explicitly flatten cells into the design.

YosysHQ / eqy

Running eqy can final report error "Argument list too long" #40