Open Quuxplusone opened 3 years ago
Thanks for the report Simon. Paul: I'm adding you to the CC on the off-chance you've done any profiling recently of tblgen or had any planned performance improvements in the works (just to avoid duplicated effort).
No, I haven't done any profiling and I have no performance enhancements on my to-do list. Don't forget the llvm-tblgen -time-phases option, which can give you a quick idea where the time is being spent.
Do you have some idea which backends are chewing up the most time?
Adding in some data I've generated from the RISC-V backend and intrinsics. The
other tablegen targets take less than two second a piece.
[8/42] Building RISCVGenGlobalISel.inc...
===-------------------------------------------------------------------------===
TableGen Phase Timing
===-------------------------------------------------------------------------===
Total Execution Time: 15.9415 seconds (17.0387 wall clock)
---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
14.8536 ( 95.4%) 0.3255 ( 86.4%) 15.1791 ( 95.2%) 15.4707 ( 90.8%) Backend overall
0.7109 ( 4.6%) 0.0510 ( 13.5%) 0.7619 ( 4.8%) 1.5641 ( 9.2%) Parse, build records
0.0005 ( 0.0%) 0.0000 ( 0.0%) 0.0005 ( 0.0%) 0.0040 ( 0.0%) Write output
15.5649 (100.0%) 0.3766 (100.0%) 15.9415 (100.0%) 17.0387 (100.0%) Total
[9/42] Building RISCVGenInstrInfo.inc...
===-------------------------------------------------------------------------===
TableGen Phase Timing
===-------------------------------------------------------------------------===
Total Execution Time: 16.0652 seconds (17.1984 wall clock)
---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
14.4727 ( 91.9%) 0.2624 ( 81.2%) 14.7351 ( 91.7%) 14.9221 ( 86.8%) Analyze DAG patterns
0.7123 ( 4.5%) 0.0441 ( 13.7%) 0.7564 ( 4.7%) 1.7012 ( 9.9%) Parse, build records
0.1840 ( 1.2%) 0.0166 ( 5.1%) 0.2006 ( 1.2%) 0.2011 ( 1.2%) Emit helper methods
15.7420 (100.0%) 0.3232 (100.0%) 16.0652 (100.0%) 17.1984 (100.0%) Total
[2/5] Building RISCVGenDAGISel.inc...
===-------------------------------------------------------------------------===
TableGen Phase Timing
===-------------------------------------------------------------------------===
Total Execution Time: 18.6654 seconds (18.8064 wall clock)
---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
13.9062 ( 76.2%) 0.3322 ( 81.5%) 14.2384 ( 76.3%) 14.2747 ( 75.9%) Parse patterns
2.2140 ( 12.1%) 0.0000 ( 0.0%) 2.2141 ( 11.9%) 2.2195 ( 11.8%) Convert to matchers
0.6775 ( 3.7%) 0.0520 ( 12.8%) 0.7295 ( 3.9%) 0.7349 ( 3.9%) Parse, build records
0.5580 ( 3.1%) 0.0000 ( 0.0%) 0.5580 ( 3.0%) 0.5593 ( 3.0%) Sort patterns
0.5381 ( 2.9%) 0.0000 ( 0.0%) 0.5381 ( 2.9%) 0.5394 ( 2.9%) Optimize matchers
0.3553 ( 1.9%) 0.0199 ( 4.9%) 0.3753 ( 2.0%) 0.3762 ( 2.0%) Emit matcher table
0.0085 ( 0.0%) 0.0034 ( 0.8%) 0.0120 ( 0.1%) 0.1024 ( 0.5%) Write output
18.2577 (100.0%) 0.4076 (100.0%) 18.6654 (100.0%) 18.8064 (100.0%) Total
No surprises there, I suppose.
Yep, those are usually the three culprits. I should add more instrumentation to GlobalISel and InstrInfo so we can see which backend phases take the time. Then we can look at the longest phases and see if there is anything we can do.
In DagISel, the pattern parsing might be able to be improved.
I think a lot of time is spent in GenerateVariants. Do we need to do that for InstrInfo?
GenerateVariants() is a phase of building a CodeGenDAGPatterns instance. Instances a built by DAGIselEmitter, FastIselEmitter, GlobalIselEmitter. InstrDocsEmitter, InstrInfoEmitter.
GenerateVariants() has one double-nested loop and one triple-nested loop.The first loop calls GenerateVariantsOf(), which is pretty hairy.
I don't know enough about variants to get a feeling for how this might be improved. I can spend some time studying it. Do you have a sense of where it could be improved?
There were two patches in 2018 that were intended to improve the performance of GenerateVariants https://reviews.llvm.org/D51035 and https://reviews.llvm.org/D51839 on debug builds on MSVC. They might have slowed down Release builds or non-MSVC builds. Need to do some more investigation.
Or they are less helpful on RISCV than they were on X86.
I'm going to see if I can switch the predicates in PatternToMatch back to a
simple to compare type like ListInit* which is what it was prior to the
introduction of HwMode in 2017. That might allow us to remove the large
std::vector of BitVector from GenerateVariants.
Patches to try to improve things https://reviews.llvm.org/D100691 https://reviews.llvm.org/D100695
Simon, are you able to tell how much things have improved? My debug build on my
Linux server is currently about 29 seconds to generate RISCVGenInstrInfo.inc,
but I think MSVC time might be higher?
Summary of changes made so far
-Expand variants before expanding HwMode. Since expand variants is quadratic in
the number of patterns, it is better to do it before we double the patterns by
expanding hw mode.
-Patterns are stored in PatternToMatch now using ListInit * making them cheap
to compare in GenerateVariants where we used to have a std::vector compare.
-Removed BitVector based predicate filtering in GenerateVariants. This removed
a bunch of heap overhead for storing a nxn matrix of bits where n is the number
of patterns.
Minor things:
-Replaced std::set or DenseSet with SmallSet or MachineValueTypeSet which
should cut down on some heap allocation or hashing.
(In reply to Craig Topper from comment #12)
> Simon, are you able to tell how much things have improved? My debug build on
> my Linux server is currently about 29 seconds to generate
> RISCVGenInstrInfo.inc, but I think MSVC time might be higher?
Yeah, MSVC tends to be slower for debug builds.
I'm away from my main dev machine for the holiday weekend so haven't had chance
to compare full builds yet.
Fraser - any chance you can repeat those timing reports against trunk latest
please?
Certainly. As before, just using -time-phases in a release build:
[12/17] Building RISCVGenGlobalISel.inc...
===-------------------------------------------------------------------------===
TableGen Phase Timing
===-------------------------------------------------------------------------===
Total Execution Time: 5.4829 seconds (5.7914 wall clock)
---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
4.5548 ( 86.4%) 0.1697 ( 81.0%) 4.7245 ( 86.2%) 4.8007 ( 82.9%) Backend overall
0.7185 ( 13.6%) 0.0398 ( 19.0%) 0.7583 ( 13.8%) 0.9906 ( 17.1%) Parse, build records
0.0001 ( 0.0%) 0.0000 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Write output
5.2734 (100.0%) 0.2095 (100.0%) 5.4829 (100.0%) 5.7914 (100.0%) Total
[11/46] Building RISCVGenInstrInfo.inc...
===-------------------------------------------------------------------------===
TableGen Phase Timing
===-------------------------------------------------------------------------===
Total Execution Time: 5.3306 seconds (5.7432 wall clock)
---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
4.0939 ( 79.8%) 0.1364 ( 68.3%) 4.2302 ( 79.4%) 4.3308 ( 75.4%) Analyze DAG patterns
0.6909 ( 13.5%) 0.0633 ( 31.7%) 0.7542 ( 14.1%) 1.0660 ( 18.6%) Parse, build records
0.1299 ( 2.5%) 0.0000 ( 0.0%) 0.1299 ( 2.4%) 0.1300 ( 2.3%) Emit helper methods
0.1100 ( 2.1%) 0.0000 ( 0.0%) 0.1100 ( 2.1%) 0.1101 ( 1.9%) Emit map table
0.0437 ( 0.9%) 0.0000 ( 0.0%) 0.0437 ( 0.8%) 0.0437 ( 0.8%) Emit InstrDesc records
0.0258 ( 0.5%) 0.0000 ( 0.0%) 0.0258 ( 0.5%) 0.0258 ( 0.4%) Emit operand info
0.0117 ( 0.2%) 0.0000 ( 0.0%) 0.0117 ( 0.2%) 0.0117 ( 0.2%) Emit operand type mappings
0.0104 ( 0.2%) 0.0000 ( 0.0%) 0.0104 ( 0.2%) 0.0104 ( 0.2%) Emit instruction names
0.0035 ( 0.1%) 0.0000 ( 0.0%) 0.0035 ( 0.1%) 0.0035 ( 0.1%) Emit logical operand size mappings
0.0035 ( 0.1%) 0.0000 ( 0.0%) 0.0035 ( 0.1%) 0.0035 ( 0.1%) Emit uses/defs
0.0034 ( 0.1%) 0.0000 ( 0.0%) 0.0034 ( 0.1%) 0.0034 ( 0.1%) Emit logical operand type mappings
0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0033 ( 0.1%) Emit operand name mappings
0.0009 ( 0.0%) 0.0000 ( 0.0%) 0.0009 ( 0.0%) 0.0009 ( 0.0%) Write output
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Emit initialization routine
5.1310 (100.0%) 0.1997 (100.0%) 5.3306 (100.0%) 5.7432 (100.0%) Total
[13/16] Building RISCVGenDAGISel.inc...
===-------------------------------------------------------------------------===
TableGen Phase Timing
===-------------------------------------------------------------------------===
Total Execution Time: 6.6774 seconds (7.0148 wall clock)
---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
3.6783 ( 56.4%) 0.1099 ( 71.8%) 3.7882 ( 56.7%) 3.8855 ( 55.4%) Parse patterns
1.2059 ( 18.5%) 0.0000 ( 0.0%) 1.2059 ( 18.1%) 1.2068 ( 17.2%) Convert to matchers
0.7158 ( 11.0%) 0.0365 ( 23.9%) 0.7524 ( 11.3%) 0.9905 ( 14.1%) Parse, build records
0.4000 ( 6.1%) 0.0000 ( 0.0%) 0.4000 ( 6.0%) 0.4006 ( 5.7%) Sort patterns
0.2748 ( 4.2%) 0.0000 ( 0.0%) 0.2748 ( 4.1%) 0.2752 ( 3.9%) Optimize matchers
0.2464 ( 3.8%) 0.0066 ( 4.3%) 0.2530 ( 3.8%) 0.2532 ( 3.6%) Emit matcher table
0.0031 ( 0.0%) 0.0000 ( 0.0%) 0.0031 ( 0.0%) 0.0031 ( 0.0%) Write output
6.5244 (100.0%) 0.1530 (100.0%) 6.6774 (100.0%) 7.0148 (100.0%) Total
One thing that may be worth mentioning now is that the various MC emitters
(RISCVGenAsmWriter.inc, RISCVGenMCCodeEmitter.inc,
RISCVGenDisassemblerTables.inc, RISCVGenAsmMatcher.inc) take 1.2 - 1.4 seconds
each. Now that the other emitters have been reduced well, these are taking up a
larger percentage of the time. It's 70-80% "Parse, build records" time for
those.
I'm going to spend some time in the next few weeks looking at TableGen parsing to see if there is anything that can be done relatively easily to speed it up. I'm not optimistic, but it's worth a good look.
Hang on, are those two sets of timings comparing apples with apples? If so, nice!
Thanks - I can confirm my build times have definitely improved!
Has anyone any ideas on what to do next on this?
This is a very general ticket, so if we have more specific improvements in mind, I'm happy for this to be resolved and those improvements to be handled in their own tickets.
If it helps, I just looked into the fairly nebulous "Analyze DAG patterns" and found that the call to CodeGenDAGPatterns::GenerateVariants takes about 50% of the -gen-instr-info time. I don't know how easy that would be to optimize.
I spent a couple of days fooling around with the TableGen lexer trying to speed it up. No joy there. I could knock off 1–2% of the time, which was probably just noise.
I think we can close this ticket.
(In reply to Paul C. Anagnostopoulos from comment #20)
> I spent a couple of days fooling around with the TableGen lexer trying to
> speed it up. No joy there. I could knock off 1–2% of the time, which was
> probably just noise.
>
> I think we can close this ticket.
I don't agree - until we've identified the next things to work on to reduce
tblgen runtimes, we shouldn't close this.
A few ideas
-Call InferInstructionFlags and VerifyInstructionFlags before GenerateVariants and ExpandHwModeBasedTypes and don't call GenerateVariants or ExpandHwModeBasedTypes for GenInstrInfo.inc. This assumes that GenInstrInfo.inc doesn't care about the expanded patterns and only cares about the mayLoad/mayStore/etc flags and that those flags don't depend on the expansion.
-Replace the std::map in InfoByHwMode by a "small" data structure of some sort. It might be possible to just maintain a sorted SmallVector
Okay, we can certainly keep it open.
The RISC-V tblgen builds have been getting slower and slower, and now appears to be the slowest of the big three (AMDGPU, X86 and RISC-V), especially if I end up having to use debug builds.....
I realise slower builds is almost inevitable as RISC-V develops and adds features, but it'd be great if there was some low hanging cleanup that could be done to get this under control - whether its improvements to tblgen itself or the RISC-V td files get more efficient.