Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

functionattrs exhibits superlinear behaviour #16397

Open Quuxplusone opened 11 years ago

Quuxplusone commented 11 years ago
Bugzilla Link PR16398
Status NEW
Importance P normal
Reported by Nick Lewycky (nlewycky@google.com)
Reported on 2013-06-21 00:07:36 -0700
Last modified on 2017-03-02 15:58:03 -0800
Version trunk
Hardware PC Linux
CC baldrick@free.fr, chandlerc@gmail.com, clattner@nondot.org, ditaliano@apple.com, llvm-bugs@lists.llvm.org, rafael@espindo.la, rnk@google.com, wendling@apple.com
Fixed by commit(s)
Attachments x.ll.gz (24229 bytes, application/x-gzip)
Blocks
Blocked by
See also
This is broken out from bug 13263. On the attached .ll file, we see this:

===-------------------------------------------------------------------------===
                      ... Pass execution timing report ...
===-------------------------------------------------------------------------===
  Total Execution Time: 4.9600 seconds (4.9723 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   3.8400 ( 78.0%)   0.0100 ( 25.0%)   3.8500 ( 77.6%)   3.8668 ( 77.8%)  Deduce function attributes
   0.6800 ( 13.8%)   0.0000 (  0.0%)   0.6800 ( 13.7%)   0.6421 ( 12.9%)  SROA
   0.1800 (  3.7%)   0.0300 ( 75.0%)   0.2100 (  4.2%)   0.1741 (  3.5%)  Function Integration/Inlining
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0570 (  1.1%)  Module Verifier
[...]

The slowdown is happening inside AddNoCaptureAttrs, and the capture tracking
analysis happens to be trivial so that isn't the problem (and SCCNodes.size()
== 1 so that also isn't the problem). I'm placing my bet on inefficient
AttributeSet manipulation.
Quuxplusone commented 11 years ago

Attached x.ll.gz (24229 bytes, application/x-gzip): testcase

Quuxplusone commented 8 years ago
I took a look at this bug because I found it amusing.
First of all, the opaque pointer work made the IR not valid anymore.
I converted using a script found on llvm-commits:

$ cat load.py
import fileinput
import sys
import re

pat = re.compile(r"((?:=|:|^)\s*load (?:atomic )?(?:volatile )?(.*?))(|
addrspace\(\d+\) *)\*($|
*(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$)")

for line in sys.stdin:
  sys.stdout.write(re.sub(pat, r"\1, \2\3*\4", line))

With that in place, I ran again through opt and this is what I got:

===-------------------------------------------------------------------------===
                      ... Pass execution timing report ...
===-------------------------------------------------------------------------===
  Total Execution Time: 1.0269 seconds (1.0272 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   0.5289 ( 52.3%)   0.0012 (  7.8%)   0.5300 ( 51.6%)   0.5300 ( 51.6%)  SROA
   0.1025 ( 10.1%)   0.0004 (  3.0%)   0.1029 ( 10.0%)   0.1032 ( 10.0%)  Function Integration/Inlining
   0.0838 (  8.3%)   0.0002 (  1.2%)   0.0840 (  8.2%)   0.0840 (  8.2%)  Module Verifier
   0.0475 (  4.7%)   0.0001 (  0.9%)   0.0476 (  4.6%)   0.0476 (  4.6%)  Combine redundant instructions
   0.0227 (  2.2%)   0.0001 (  0.6%)   0.0227 (  2.2%)   0.0227 (  2.2%)  Interprocedural Sparse Conditional Constant Propagation
   0.0153 (  1.5%)   0.0001 (  0.9%)   0.0155 (  1.5%)   0.0155 (  1.5%)  Reassociate expressions
   0.0150 (  1.5%)   0.0001 (  0.8%)   0.0151 (  1.5%)   0.0151 (  1.5%)  Dead Store Elimination
   0.0133 (  1.3%)   0.0002 (  1.6%)   0.0135 (  1.3%)   0.0135 (  1.3%)  Simplify the CFG
   0.0087 (  0.9%)   0.0001 (  0.7%)   0.0088 (  0.9%)   0.0088 (  0.9%)  Tail Call Elimination
   0.0082 (  0.8%)   0.0001 (  0.8%)   0.0084 (  0.8%)   0.0084 (  0.8%)  Deduce function attributes
   0.0076 (  0.8%)   0.0001 (  0.9%)   0.0078 (  0.8%)   0.0077 (  0.8%)  Sparse Conditional Constant Propagation
   0.0058 (  0.6%)   0.0001 (  1.0%)   0.0059 (  0.6%)   0.0059 (  0.6%)  Value Propagation
   0.0056 (  0.6%)   0.0001 (  0.6%)   0.0057 (  0.6%)   0.0057 (  0.6%)  Value Propagation
   0.0052 (  0.5%)   0.0003 (  1.9%)   0.0054 (  0.5%)   0.0054 (  0.5%)  Early CSE

FunctionAttrs doesn't even show up in the top 5 (< 1%), but 52% of the time is
spent in SROA. I think this PR deserves a new title.
Quuxplusone commented 8 years ago
Same ranking in a non-debug build:

===-------------------------------------------------------------------------===
                      ... Pass execution timing report ...
===-------------------------------------------------------------------------===
  Total Execution Time: 0.1241 seconds (0.1230 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   0.0489 ( 41.1%)   0.0000 (  0.4%)   0.0489 ( 39.4%)   0.0469 ( 38.1%)  SROA
   0.0154 ( 12.9%)   0.0001 (  1.4%)   0.0155 ( 12.5%)   0.0153 ( 12.4%)  Function Integration/Inlining
   0.0088 (  7.4%)   0.0000 (  0.8%)   0.0088 (  7.1%)   0.0086 (  7.0%)  Combine redundant instructions
   0.0086 (  7.2%)   0.0000 (  0.2%)   0.0086 (  6.9%)   0.0078 (  6.3%)  Module Verifier
   0.0041 (  3.5%)   0.0000 (  0.1%)   0.0041 (  3.3%)   0.0035 (  2.9%)  Dead Store Elimination
   0.0023 (  2.0%)   0.0000 (  0.0%)   0.0023 (  1.9%)   0.0024 (  1.9%)  Interprocedural Sparse Conditional Constant Propagation
   0.0008 (  0.6%)   0.0000 (  0.1%)   0.0008 (  0.6%)   0.0022 (  1.8%)  Deduce function attributes
   0.0021 (  1.8%)   0.0001 (  1.2%)   0.0022 (  1.7%)   0.0021 (  1.7%)  Tail Call Elimination
   0.0018 (  1.6%)   0.0000 (  0.0%)   0.0018 (  1.5%)   0.0016 (  1.3%)  Remove unused exception handling info
   0.0014 (  1.2%)   0.0000 (  0.1%)   0.0014 (  1.1%)   0.0013 (  1.0%)  Value Propagation
   0.0007 (  0.6%)   0.0000 (  0.2%)   0.0007 (  0.6%)   0.0012 (  1.0%)  Simplify the CFG
   0.0012 (  1.0%)   0.0000 (  0.0%)   0.0012 (  1.0%)   0.0011 (  0.9%)  Value Propagation

Not sure it's worth worrying too much, but I'll try to investigate a little bit
more.