YosysHQ / yosys

Yosys Open SYnthesis Suite
https://yosyshq.net/yosys/
ISC License
3.52k stars 894 forks source link

-relut appears to cause performance problems with ice40 synthesis #1187

Closed zeusima closed 5 years ago

zeusima commented 5 years ago

Steps to reproduce the issue

./nextpnr/ice40/picorv32.sh

Using commit dd8d264b:

ICESTORMLC: 1540/ 7680 20% Info: Max frequency for clock 'clk$glb_clk': 70.88 MHz (PASS at 12.00 MHz) Total path delay: 14.28 ns (70.02 MHz)

Using commit 463f7100:

ICESTORMLC: 1952/ 7680 Info: Max frequency for clock 'clk$glb_clk': 64.07 MHz (PASS at 12.00 MHz) Total path delay: 21.73 ns (46.02 MHz)

Same behaviour is observed with dd8d264b when -relut is enabled. Changing the placer algorithm or ABC setting makes no difference. The forcing of -relut to be always enabled highlighted this issue.

Expected behavior

An improvement to the utilisation and/or performance of the design should've been observed.

Actual behavior

The design shows higher utilisation and lower performance when -relut is enabled. This is consistent across multiple designs.

smunaut commented 5 years ago

From IRC:

18:15 < daveshah> The LUT map ordering changed to accommodate abc9 18:16 < daveshah> This will cause the relut changes to no longer allow the LUT and carry to be packed together

smunaut commented 5 years ago

@zeusima You can give https://pastebin.com/Uxc9HM5s a shot

janrinze commented 5 years ago

pastebin from smunaut seems to give relief. The results without relut seem to be a slight bit faster but that might be specific for my design.

whitequark commented 5 years ago

@smunaut Can you please file this as a PR?

@janrinze Note that there's a lot of noise in post-routing timing results, unfortunately, so small changes are generally not meaningful if you look at individual and not statistical change.

smunaut commented 5 years ago

Yup working on submitting now.

dpiegdon commented 5 years ago

Please note that the current master of yosys does no longer produce correct outputs for combinatorial loops on ice40. A testcase can be found in https://github.com/dpiegdon/ringoscillator - here multiple ringoscillators are used to generate test output (metastable, and oscillators with different delay lines). The bistream generated with the current master leads to a stable output on all of these. (i.e. no clock, no metastable output). If I revert the commit https://github.com/smunaut/yosys/commit/f28e38de9994151ea4e22608441dbc9e116d7b8c they work again.

In https://github.com/DurandA/verilog-buildingblocks/issues/1 @smunaut argues about the details. What I take from that is that maybe the optimizer (relut?) should be disabled until it has been replaced with a new version?

whitequark commented 5 years ago

@dpiegdon What is "correct output" for a combinatorial loop? I don't think Yosys gives any guarantees on what they will synthesize into.

In any case, a ring oscillator must not be designed as a combinatorial loop but as a series of primitives marked with (*keep*) if you want it to work, and a ring oscillator in FPGA fabric is not in fact guaranteed to work well (since they are prone to mode locking and interference) or at all (since it does not pass timing).

I am entirely against disabling a useful optimization pass (relut) in order to support a useless pattern (fabric ring oscillators written as combinatorial loops).

eddiehung commented 5 years ago

Let's be sensible: I disagree that fabric ring oscillators are useless, but instead an atypical pattern used in some unique applications, and one we should have no problems supporting. Now I don't think we should revert an optimisation that gives an improvement for most use cases, but instead try and get to the bottom of why it's broken and fix it so we can have our cake and eat it.

Let's approach it as we would any other regression: can you minimise the breaking design into a small circuit? What is the result with and without the bisected commit?

whitequark commented 5 years ago

I disagree that fabric ring oscillators are useless, but instead an atypical pattern used in some unique applications

Sorry, my comment was badly written. I meant that inference of ring oscillators is, IMO, useless. Ring oscillators made from primitives with (*keep*) should obviously keep working, and if that does not work then it is a bug in some pass.

smunaut commented 5 years ago

It's a bit irrelvant anymay, #1290 removes relut and apparently works fine with his use case.

smunaut commented 5 years ago

Mmm, digging into the logs posted on https://github.com/DurandA/verilog-buildingblocks/issues/1

Seems to show there is indeed an issue with this commit. Although it does restore the I0-I3 to their proper place the INIT string stays mangled ... (and this is on ring fully made up of manually instanciated LUT4)

https://github.com/dpiegdon/verilog-buildingblocks/blob/91c1ca8a5eedd83f019517a90e06f6a495e53a81/lattice_ice40/ringoscillator.v

eddiehung commented 5 years ago

It's a bit irrelvant anymay, #1290 removes relut and apparently works fine with his use case.

1290 doesn't remove -relut... (it removes ice40_unlut) and should have no impact on this.

EDIT: Sorry! I see that https://github.com/smunaut/yosys/commit/f28e38de9994151ea4e22608441dbc9e116d7b8c did change ice40_unlut which would have corrupted unlut-ing of manually instantiated LUTs, which is the root cause of this, so very relevant in fact!

I will try and merge #1290 in the morning, but if someone wants to resolve the minor conflict and merge then please go ahead!

whitequark commented 5 years ago

@eddiehung done.

dpiegdon commented 5 years ago

I can confirm that with the current master all my testcases are working properly now. Thanks guys!!

eddiehung commented 5 years ago

Thanks everyone. For future more forgetful me and future generations, the post-mortem of this is thus: