Closed Quuxplusone closed 14 years ago
When this happens, we can eliminate Type::canLosslesslyBitCastTo and probably
some other stuff only
used by -raise.
-Chris
Is there anything preventing this from being eliminated now?
Can we use the stats to determine how much it fires (if at all)?
What pieces does it implement that might still be useful to retain?
Nothing prevents it. We just need someone to do an llvm-test run with an
without it to see if there are
any perf differences.
-Chris
or even just run SPEC2K + SPEC2K6.
-Chris
I will set up overnight performance test run.
I want this gone. :)
Attached report.nightly.csv
(3870 bytes, text/plain): External Nightly Test Results With -raise
Attached report.nightly.txt
(5736 bytes, text/plain): External Nightly Test Results Without -raise
Summary Of Performance Results (See the Spreadsheet)
Four SPEC test suites were compared: CINT2000, CFP2000, CINT2006, and CFP2006.
In general it looks positive to remove the -raise pass. The overall results
(sum of all tests) are as follows:
+--Compile-+ +----------Run Times--------+
ITEM GCCAS Bytecode LLC JIT GCC CBE LLC JIT
Delta 151.05 818298 -9.07 16.2 14.66 20.81 46.40 57.15
Percent 6.49% 1.95% -1.09% 3.09% 2.18% 3.14% 7.12% 4.04%
The "Delta" values shown are the difference in aggregate time between "with
-raise" and "without -raise" runs across all test cases. Negative values mean
the "without -raise" value was larger. Positive values mean the "without -raise"
value was smaller. Because these values represent the subtraction of the
"without -raise" from the "with -raise" data sets, positive values favor removal
of -raise while negative values favor keeping -raise.
The "Percent" values are computed by dividing the "Delta" value by the "with
-raise" sum. This gives an approximation of the magnitude of change introduced
by removing the -raise pass.
To interpret this correctly, one would say: "On average, the -raise pass
increases bytecode volumes by nearly 2% and LLC runtimes by over 7%".
The only negative impact of removing -raise seems to be a 1.09% increase in the
LLC compilation time. This is rather surprising given that there is less work
done by llc without -raise.
Note that these results account for only one run of the nightly tests.
Additional samples would be welcome.
Attached PR1072.results.xls
(45056 bytes, application/octet-stream): Performance Data Spreadsheet
Attached PR1072.results.xls
(33280 bytes, application/octet-stream): Performance Data Spreadsheet
Chris,
Please let me know if this is sufficient evidence to support removal of the
-raise pass.
Reid
Yes, it looks like we are close to it. Things worth investigating:
* What does -raise do on 401.bzip2 that shrinks the bc file 18K? Once
understood, we can reimplement
this in some other pass.
* I have little confidence in the stabilities of these numbers, they seem very
noisy. Can you see if the
254.gap/LLC, 254.gap/CBE, 255.vortex/CBE, and 256.bzip2/cbe slowdowns
reproduce? If so, we should
investigate what is happening there too.
-Chris
> * What does -raise do on 401.bzip2 that shrinks the bc file 18K? Once
> understood, we can reimplement this in some other pass.
I ran 401.bzip2 using "head" from today and compared it against the "without
-raise" run I did last night. The bytecode sizes are identical. Consequently, I
don't think I was comparing apples with apples. I used the results of my nightly
run from a couple days ago for the "with -raise" data but my "without -raise"
also included the shift patch, which may have affected optimization
(positively).
So, guess I get to do this again.
Devang, did you run this comparison on Darwin?
> * I have little confidence in the stabilities of these numbers, they seem very
> noisy.
I eliminated as much interference workload as I could by shutting down email,
xchat, etc. and just let it run. I think the issue here is the difference in the
builds. I'll run the comparison again and report new numbers. Unfortunately,
it'll have to wait until this evening for "quiet machine".
> Can you see if the 254.gap/LLC, 254.gap/CBE, 255.vortex/CBE, and
> 256.bzip2/cbe slowdowns reproduce? If so, we should investigate what is
> happening there too.
I'll defer this until after I get new numbers. If they still look problematic,
I'll investigate then.
Makes sense. A lot has changed in a few days. I suspect the 'noise' is really measuring these changes :)
-Chris
Yes. I now have two environments that were updated at the same moment. The only
difference is that one has -raise and one doesn't. This should be good for
testing. I'm going to do Olden with GET_STABLE_NUMBERS=1 because its relatively
quick and a good test for stability. If that goes okay, I'll get to External
tests this weekend (7+ hours each).
sounds great, thanks reid!
Attached Olden.results.xls
(16896 bytes, application/octet-stream): Olden Performance Comparison
> there is 0 difference in bytecode sizes. Perhaps -raise
> doesn't do anything at all on Olden?
Cool. You could diff them to verify.
This is a good sign -raise isn't doing anything any more. :) SPEC will give
more certainty though.
[reid@bashful llvm-test-1]$ for f in
MultiSource/Benchmarks/Olden/*/Output/*.llvm.bc ; do
> cmp $f ../llvm-test-4/$f
> done
[reid@bashful llvm-test-1]$ echo $?
0
Looks pretty identical to me.
Attached External.results.xls
(31744 bytes, application/octet-stream): External Performance Comparison (Spreadsheet)
Looking at the External test comparison, it doesn't look like -raise does much.
However a few things could be investigated:
255.vortex - Why does raise eliminate 733 bytes of bytecode?
433.milc - What does raise do to make llc 1.9 seconds faster?
456.hmmer - What does raise do to make llc 1.54 seconds fasters?
447.dealII - Why is "gccas" (opt) 2.93 seconds faster with -raise?
400.perlbench - What does raise do to make jit 1.04 seconds faster, especially
with 1133 more bytecode?
We should generate comparisons on Darwin/PPC as well just to make sure that
-raise doesn't assist any backend opts on that platform. I don't see why it
would, but we should check.
For 255.vortex, here's what I found:
1. The bytecode size difference is due to the -raise pass removing or
altering the names of values. A diff of the llvm assembly corresponding to
the .llvm.bc files from both cases produced 6000 lines of diffs. But, the
only way they differed was in the names of values. I ran opt -strip on these
files and re-ran the diff. It was vastly shorter. I'll attach it.
2. The diff produces in #1 showed that -raise will fold a GEP instruction into
a store instruction using a GEP constant expression. Without -raise this
doesn't happen. There is also some changes in the # of uses probably related
to the GEP folding into the store.
Its not clear why -instcombine wouldn't do the same transform as #2.
Attached strip.diff
(2665 bytes, text/plain): 255.vortex difference on stripped files
Attached 433.milc.diff
(640 bytes, text/plain): 433.milc differences
Attached 456.hmmer.diff
(32462 bytes, text/plain): 456.hmmer differences
Attached 447.dealII.diff
(62181 bytes, text/plain): 447.dealII differences
For 400.perlbench:
Unfortunately, my machine doesn't have enough memory to do a diff of two 2.7GB
LLVM assembly files.
the 255.vortex differences aren't worth worrying about, we're good there.
ok, go ahead and nuke it.
The -raise pass was remvoed with these patches:
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20070129/043817.html
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20070129/043818.html
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20070129/043819.html
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20070129/043820.html
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20070129/043821.html
Resolved.
report.nightly.csv
(3870 bytes, text/plain)report.nightly.txt
(5736 bytes, text/plain)PR1072.results.xls
(45056 bytes, application/octet-stream)PR1072.results.xls
(33280 bytes, application/octet-stream)Olden.results.xls
(16896 bytes, application/octet-stream)External.results.xls
(31744 bytes, application/octet-stream)strip.diff
(2665 bytes, text/plain)433.milc.diff
(640 bytes, text/plain)456.hmmer.diff
(32462 bytes, text/plain)447.dealII.diff
(62181 bytes, text/plain)