YosysHQ / yosys

Yosys Open SYnthesis Suite
https://yosyshq.net/yosys/
ISC License
3.3k stars 860 forks source link

Performance Issue: Synthesis Takes Too Long to Complete #4445

Open LoSyTe opened 2 weeks ago

LoSyTe commented 2 weeks ago

Version

yosys 0.41+126

On which OS did this happen?

Linux

Reproduction Steps

Hello, I encountered a performance issue while using Yosys to synthesize a Verilog file. The specific details are as follows:

I used the Verilog file design.v, and when running the Yosys synthesis, it has been over 30 minutes without completion. My system configuration is as follows: 128G RAM, and a high-performance CPU. However, the synthesis process seems to be stuck in an infinite loop or some kind of performance bottleneck.

To rule out issues with the file itself, I have checked the Verilog file for syntax errors but did not find any obvious problems. Additionally, I am using the latest version of Yosys.

Attached is the Verilog file (design.v) that triggers this issue. I hope to get the community's help and attention.

After this error, the parsing process is interrupted. I have checked the Verilog file for syntax errors but couldn't find any obvious issues. I am using the latest version of Yosys.

Attached is the Verilog file (design.v) that triggers the error. Thank you in advance for your attention to this matter. I look forward to hearing from you regarding this issue. design_file.zip

Expected Behavior

Synthesis completes successfully within a reasonable time

Actual Behavior

Synthesis takes too long and does not complete

whitequark commented 2 weeks ago

You should minimize the reproducer. It is a waste of everyone's time to file non-minimized fuzzer-discovered issues and personally I am leaning towards closing such issues outright.

georgerennie commented 2 weeks ago

I would agree with the previous comment on minimization. If you haven't seen it before https://blog.regehr.org/archives/2037 provides good guidelines on reporting fuzzed bugs. Additionally it would have been helpful to include the synthesis script used. I have assumed it was just synth as that causes a hang for me too.

If you aren't familiar with it, Yosys includes a testcase minimizer in the form of the bugpoint command. Running yosys -p "read_verilog design.v; bugpoint -command \"synth\" -runner \"timeout 1\" -fast; write_verilog bug.v" results in the following testcase after only a few minutes (I've tidied the one shown here with clean -purge; write_verilog -noattr bug.v)

/* Generated by Yosys 0.40+45 (git sha1 dd2195543, g++ 13.2.1 -fPIC -Os) */

module module11(_0_);
  output [3:0] _0_;
  wire [3:0] _0_;
  wire _1_;
  wire [3:0] _2_;
  wire [39:0] _3_;
  wire [39:0] _4_;
  assign _3_ = _1_ >= _2_;
  assign _0_ = $signed(4'hx) >> _4_;
  assign _4_ = _3_ - 40'h734e475132;
endmodule

Running read_verilog bug.v; synth on this seems to go very slow at

2.10. Executing PEEPOPT pass (run peephole optimizers).
shiftadd pattern in module11: shift=$shr$bug.v:11$2, add/sub=$sub$bug.v:12$3, offset: -1313296690

2.11. Executing OPT_CLEAN pass (remove unused cells and wires).
Finding unused cells or wires in module \module11..

You can recreate this by just running wreduce; peepopt. peepopt's shiftadd converts A>>(B+D) to (A'>>D)>>(B) where D is const and A' is a zero padded version of A. The issue seems to be that where D is -1313296690, A is padded with a 1313296691 bit constant that is all zeros, and Yosys is understandably slow to handle such a large constant.

It seems to me like although this is a case where performance isn't ideal, no one would ever have a shift of -1313296690 in a non-fuzzing scenario so its not really a big deal.

Edit: This seems to be about the same as https://github.com/YosysHQ/yosys/issues/4056 and https://github.com/YosysHQ/yosys/pull/4057 but for shiftadd instead of shiftmul

whitequark commented 2 weeks ago

Thanks @georgerennie! I would have linked the same article if I had the URL on hand.

LoSyTe commented 2 weeks ago

@whitequark @georgerennie Thank you for your response. I encountered this issue during a stress test, and I also tried it on several machines, but faced the same problem where logic synthesis could not be completed within a reasonable time. I have checked #4056 and #4057, which also reported extended synthesis times due to certain padding operations. Although I discovered this performance issue during a stress test, I believe this fault is noteworthy. Thank you.

georgerennie commented 2 weeks ago

So I had a look into it a little more, and there is actually a bug happening here, but it isn't what causes the poor performance.

Although the shift is nominally by 40'h734e475132, shiftadd stores the offset in an int. This shift value overflows the int representation, becoming a shift value it shouldn't.

The poor performance is still repeatable on a testcase like the following that doesn't overflow the int. Perhaps this optimization should be disabled for large shifts to gate against this, although again I don't believe this to be a particularly noteworthy issue for real designs.

module module11(_0_);
  output [3:0] _0_;
  wire [3:0] _0_;
  wire _1_;
  wire [3:0] _2_;
  wire [31:0] _3_;
  wire [31:0] _4_;
  assign _3_ = _1_ >= _2_;
  assign _0_ = $signed(4'hx) >> _4_;
  assign _4_ = _3_ - 32'h4e475132;
endmodule
whitequark commented 2 weeks ago

In my view, Yosys should limit both the wire size and the shift size to some reasonable amount well under 4 billion. The Verilog specification explicitly allows this (provided it's above a minimum value), and it will take care of all similar issues, which otherwise will come at a small trickle indefinitely.

georgerennie commented 2 weeks ago

While I agree, it won't stop the int casting issue here (as we can't reasonably forbid constants being over 4 billion). Ideally RTLIL::Const could have a method like representable_as_int() to gate this. #4448 is a basic fix that just disables this optimization if the constant is too wide (I chose 24 bits for now) which prevents it being incorrectly cast to int and also puts an upper bound on the size of the circuit this optimization will generate.

phsauter commented 2 weeks ago

In my view, Yosys should limit both the wire size and the shift size to some reasonable amount well under 4 billion. The Verilog specification explicitly allows this (provided it's above a minimum value), and it will take care of all similar issues, which otherwise will come at a small trickle indefinitely.

It is specified in Verilog-2005 4.3.1 if anyone wants to look it up. The smallest allowed upper limit on the size of a vector (number of wires/bits) is 65536 ($2^{16}$). So something like 24 bits seems reasonable to me.

LoSyTe commented 2 weeks ago

@georgerennie Thank you so much for your help!