YosysHQ / yosys

Yosys Open SYnthesis Suite
https://yosyshq.net/yosys/
ISC License
3.44k stars 878 forks source link

Unsoundness of init value handling in opt_merge, opt_clean, and/or FfInitVals #2920

Closed mwkmwkmwk closed 3 years ago

mwkmwkmwk commented 3 years ago

Steps to reproduce the issue

Run opt_clean; opt_merge; opt_dff; opt_clean on the following:

module \mod
  wire input 1 \clk
  attribute \init 2'00
  wire width 2 $q1
  attribute \init 2'00
  wire width 2 $q2
  wire output 2 width 4 \q
  cell $dff $ff1
    parameter \CLK_POLARITY 1'1
    parameter \WIDTH 1
    connect \CLK \clk
    connect \D 1'0
    connect \Q $q1 [0]
  end
  cell $dff $ff2
    parameter \CLK_POLARITY 1'1
    parameter \WIDTH 1
    connect \CLK \clk
    connect \D 1'0
    connect \Q $q2 [0]
  end
  cell $dff $ff3
    parameter \CLK_POLARITY 1'1
    parameter \WIDTH 2
    connect \CLK \clk
    connect \D 2'00
    connect \Q { $q1 [1] $q2 [1] }
  end
  connect \q [0] $q1 [0]
  connect \q [1] $q2 [0]
  connect \q [2] $q1 [1]
  connect \q [3] $q2 [1]
end

Expected behavior

Removal of some useless FFs, no crash.

Actual behavior

ERROR: Conflicting init values for signal 1'0 ($q2 [1] = 1'x != 1'0).

What happens:

  1. opt_clean considers the public name q more important than the private names $q1,$q2`, and thus considers it canonical, reconnects Q outputs of the FFs to it, but does not move the init attributes.
  2. opt_merge merges $ff1 with $ff2, but fails to remove the duplicated init value bit, because it only looks at the directly-connected wire for those.
  3. opt_dff replaces the merged FF with a constant driver (because of const input matching init value), and calls FfInitVals in the process to remove the now-unneeded init value, but this only removes one of them.
  4. opt_clean (or whichever pass happens to construct a FfInitVals helper next) sees conflicting initial values (x and 0) on what is now a const-driven net, and crashes.

While I was unable to construct a Verilog testcase for this (probably impossible due to need for FF connected to a private-name wire), this has been triggered in the wild by nMigen, and is very tricky to reproduce due to the number of things that have to go wrong.

The question here is: which pass is in the wrong? What are the validity constraints on init values? Is FfInitVals in the wrong for failing to deal with a duplicated init val? Is it wrong to have multiple non-x init vals on a single net in the first place? Is opt_clean wrong for moving the Q output without moving the init bit? I really don't know.

However, I'm pretty sure that opt_merge is in the wrong for only removing the redundant init sometimes, and will change it to use FfInitVals to properly notice init values on faraway wires, thus fixing the above interaction. The above should still be discussed because it's not going to be the last init-related bug.

mwkmwkmwk commented 3 years ago

Also, extract from IRC #nmigen logs describing the full issue as triggered by nMigen:

  1. nMigen brings out unnamed submodule's port to parent module as submod_name$signal$123 (a public name) while it's $signal$123 (a private name) in the actual module
  2. flatten throws both of them into one namespace; it mangles the name but doesn't change private/public type
  3. opt_clean considers the parent module's public name more important than actual module's private name, and thus considers it canonical, reconnecting the DFF output from module's signal to parent's signal, but NOT moving the init attribute (arguably bug)
  4. opt_merge notices two FFs with the above arrangement have identical identical inputs and settings, and can thus be merged; however, it fails to remove init value bit because it's not on the same wire as FF's Q output anymore, and it stays there (bug here) [there are two different aliased wire bits having an init value at this point]
  5. opt_dff notices, for unrelated reason, that this FF's value can never actually change and converts it to a const driver, removing the init value bit in the process, but only notices one of the two (arguably bug here)
  6. and then whatever pass next constructs a FfInitVals helper blows up because there is now somehow an init value bit on a constant
thirtythreeforty commented 3 years ago

I understand from #2926 that this issue is believed to be mitigated?

I've hit this bug on my design:

3.35.4. Executing OPT_CLEAN pass (remove unused cells and wires).
Finding unused cells or wires in module \sonata_rev1..
ERROR: Conflicting init values for signal 1'0 (\capture_vrefchannelpostprocessor0_encoder0_inputs_sum11 [3] = 1'0 != 1'x).

using an oss-cad-suite nightly oss-cad-suite-linux-x64-20210820.tgz which has tip-of-tree (as of this writing):

$ yosys -V
Yosys 0.9+4276 (git sha1 75a4cdfc, clang 10.0.0-4ubuntu1 -fPIC -Os)
mwkmwkmwk commented 3 years ago

This particular issue should be mitigated now, but there may be more problems with init values — it's a rather delicate thing. It's entirely possible that you found another, unrelated bug — it's hard to tell without looking at the triggering example.

thirtythreeforty commented 3 years ago

I will see if I can narrow it down to an MVCE!

mwkmwkmwk commented 3 years ago

I'm fine with just getting a big raw testcase FWIW, as long as it's reprodicible with just yosys

thirtythreeforty commented 3 years ago

Alright, in that case... I've snipped out some of the logic and replaced it with keep=1 constants. But this still manages to trigger the bug just fine: sonata-yosys-repro.tar.gz

mwkmwkmwk commented 3 years ago

... turns out that #2921 is subtly wrong, since it mutates the SigMap underlying the FfInitVals, breaking some of its assumptions. I'll have another fix.

syed-ahmed commented 3 years ago

@mwkmwkmwk Thanks for working on this! I am also encountering the same error. #2977 unfortunately is not working on my design and errors out with:

ERROR: Conflicting init values for signal 1'0 (\main_litedramnativeportconverter0_wdata_converter_converter_source_payload_data [108] = 1'0 != 1'x).

I would really appreciate if you could take a look at it. Here's my testcase: repro.tar.gz

thirtythreeforty commented 3 years ago

2977 seems to help - it gets a little further. (But that could be changes in my design.) Still not making it through.

mwkmwkmwk commented 3 years ago

Yes, #2977 is only part of the solution; there'll be more patches

mwkmwkmwk commented 3 years ago

Alright, #2978 should hopefully fix this interaction for good.

tcal-x commented 3 years ago

Commit 49c0325 does fix my case (targeting ice40/fomu, #2926).

thirtythreeforty commented 3 years ago

I'll second that, this is fixed in my usage now too. Thanks!