Closed openroadie closed 1 year ago
Here are the eqy files + verilog files etc for the two designs mentioned above.
It looks like what's happening here is that when testing a design against an exact copy of itself, the default partitioning strategy that eqy currently uses generates many very small partitions, which is not optimal for performance.
It is possible to influence the partitioning using the configuration file using eqy configuration options/commands and/or yosys commands.
From the design stats, it seems that the design is using a lot of instances of primitives containing very little logic. In that case such primitives should be flattened before partitioning so that the partitioning actually sees the contained logic, making that logic available for larger partitions. (EDIT: This would happen automatically when those primitives are used only on the gate side but not on the gold side, which is the usual setup for pre- vs post synth equality checking.) This can be done by adding the -flatten
option to prep
or by running flatten
after prep
.
I also noticed that the design has a lot of auto generated names of the form _<number>_
that would not be matched for an actual pre- vs post-synth check as they are not part of the input RTL. After flattening we are left with signals having names matching _*_.*
, and we can exclude them as match candidates by adding the following lines to the eqy config:
[match *]
gate-nomatch _*_.*
The signal matching can be inspected by looking at the matched.ids
file eqy generates.
If I do these two changes, the runtime for aes
with all partitions solved by the simple
strategy is one minute on my machine and doesn't hit the argument list too long error (which we'll fix anyway though, in case a design actually requires that many partitions).
For jpeg
and the same kind of changes, it finishes in half a minute.
It is expected that for very simple equality checks the simple use sat
strategy is faster, as it has a much lower startup overhead per partition. For equality checks where the actual solving time dominates use sby
will often be faster. Additionally use sby
allows using engines (e.g. abc pdr
) that are strictly more powerful than the method sat
uses. Thus it's usually a good idea to configure use sat
as first strategy to try followed by use sby
as fallback.
Thank you for the instantaneous analysis @jix. Let me try this out. I have changed the title of the bug to reflect the current state (which is to only address the argument list too long issue).
@jix to give you some more context, we are would like to use equivalence checking as part of our flow to verify our timing optimization engine (insert/upsize buffers, inverters, clone gates, swap pins etc). So our input and output will always be a synthesized verilog netlist.
That should pose no problem with a suitable configuration, it's just not quite what EQY's defaults assume.
Note that if your optimizations can change the actual values of signals but keep their original names, it might be necessary to also exclude those names from getting matched or you can end up with EQY reporting non-equivalent partitions (that issue can also occur for pre- vs post-synthesis checks). The most robust way is to ensure that such a tool renames signals when it changes what they represent e.g. by adding a suffix, but if you can get a list of such signals from an optimization logfile that could be used to automatically generate an EQY config that takes care of excluding those signals from matching. In case such optimizations are only done to signals that do not have a user given name and there is a way to exclude all auto-generated names like in your example here, it also shouldn't be an issue.
EQY already generates some data to help manually identifying such mismatches and we also have plans for adding more automation here.
In any case, let me know if you run into any further issues with integrating EQY into your flow. Even if you run into performance issues that can't be solved by a config change, it's good to know about such cases to have benchmarks that show where improvements are needed.
Thank you so very much for all the helpful tips. This will help a lot with making our resizer/optimizer code better/more robust.
@jix , hopefully a very quick question. I am getting a failure to prove equivalence if I swap the input signals on a 2 input NAND gate. Here is the netlist
before:
DFF_X1 drvr_1 (.D(data), .CK(clk1), .Q(clk_to_nand0));
DFF_X1 drvr_2 (.D(data), .CK(clk1), .Q(clk_to_nand1));
NAND2_X4 nand_inst_0 (.A1(clk_to_nand0), .A2(clk_to_nand1), .ZN(net0));
After:
DFF_X2 drvr_1 (.D(data), .CK(clk1), .Q(clk_to_nand0));
DFF_X2 drvr_2 (.D(data), .CK(clk1), .Q(clk_to_nand1));
NAND2_X4 nand_inst_0 (.A1(clk_to_nand1), .A2(clk_to_nand0), .ZN(net0));
How can I get these to match within eqy?
Thx !
@jix , hopefully a very quick question. I am getting a failure to prove equivalence if I swap the input signals on a 2 input NAND gate. Here is the netlist
I can't really tell what's going on without more context like the cell definitions and the eqy config used, but after flattening the cells, EQY shouldn't have any problem with this. Without flattening the cells into the design (which only happens automatically if EQY doesn't find matching instances in gate/gold) EQY will treat the cell as a blackbox and can't tell that swapping inputs is equivalent. So if you're comparing two netlists, you have to explicitly flatten cells into the design.
Problem 1:
For the two provided designs the runtimes are as follows:
It is not clear if these run times are expected or is there something incorrect with the way we are using eqy.
Problem 2:
At the end of all the four runs (2 designs/2 different strategies) an error is encountered before creating the final report. The logfile for one run is available in aes/logfile.txt. Here is the last few lines of the logfile:
Testcase data is public. The designs are here. And the library data is here