Open Wren6991 opened 1 month ago
Oh wow, that hurts.
I think there is almost certainly an LLVM/Clang regression underlying. Can you profile it please?
Here is the output from -ftime-report
. First for clang++-17 (good):
Then for clang++-18 (bad):
So almost all of the time is spent in LoopRotatePass (2/3rds) and SROAPass (1/3rd). LoopRotatePass seems like it may not be that useful for the straight line code generated by CXXRTL?
Yep! This seems like a straightforward bug. I should be able to fix it whenever I get the chance, but I'm not sure when exactly that will happen.
Version
Yosys 0.39+4 (git sha1 3231c1cd9, clang++ 16.0.6 -fPIC -Os)
On which OS did this happen?
Linux
Reproduction Steps
This zip file contains the full dut.cpp from
write_cxxrtl
, which shows a 24x compile time regression: dut-full.zipOr, this zip file contains a reduced dut.cpp (900k instead of 9M), which shows a 6x compile time regression: dut-reduced.zip
Good:
Bad:
The full dut.cpp can be reproduced from Verilog source by:
The reduced DUT has the hierarchy trimmed to just the
hazard3_decode
module, and all optional instruction extensions disabled.This reproduces at -Og and above, but not at -O0.
Expected Behavior
Compile takes 45 seconds, as with clang++-16, clang++-17.
Actual Behavior
Compile takes 18 minutes with clang++-18.
I dithered on whether to report this here, but it seems like there is something about CXXRTL's code generation that hits some pathologically slow case in clang++-18, and maybe this could be improved from the CXXRTL side.
Alternatively if there is some recommended set of clang/llvm flags to use with CXXRTL to disable problematic passes and stop the compile time from blowing up, documenting that would also be helpful.
clang++-18 is the default clang++ as of Ubuntu 24.04 LTS, so I imagine more people will start hitting this.