Investigate elimination of the -raise pass

Quuxplusone commented 17 years ago


Bugzilla Link	PR1072
Status	RESOLVED FIXED
Importance	P enhancement
Reported by	Chris Lattner (clattner@nondot.org)
Reported on	2007-01-03 19:39:38 -0800
Last modified on	2010-02-22 12:47:29 -0800
Version	trunk
Hardware	Macintosh MacOS X
CC	dpatel@apple.com, llvm-bugs@lists.llvm.org, rspencer@reidspencer.com
Fixed by commit(s)
Attachments	`report.nightly.csv` (3870 bytes, text/plain) `report.nightly.txt` (5736 bytes, text/plain) `PR1072.results.xls` (45056 bytes, application/octet-stream) `PR1072.results.xls` (33280 bytes, application/octet-stream) `Olden.results.xls` (16896 bytes, application/octet-stream) `External.results.xls` (31744 bytes, application/octet-stream) `strip.diff` (2665 bytes, text/plain) `433.milc.diff` (640 bytes, text/plain) `456.hmmer.diff` (32462 bytes, text/plain) `447.dealII.diff` (62181 bytes, text/plain)
Blocks
Blocked by
See also

The -raise pass was mainly around to eliminate casts from int -> uint etc
through some tricky
transformations.  With the recent signless work, this pass should be mostly
useless.  Eliminating it is good
as it reduces compile time (one fewer pass), eliminates a bunch of ugly code
(the raise pass is very nasty),
and potentially eliminating the LLVMTransforms library (ExprTypeConvert.cpp,
LevelRaise.cpp, and
TransformInternals.cpp).

We should do performance analysis (e.g. spec2k and 2k6) to see whether this is
completely dead, and if
not, reimplement the useful pieces in other passes.

-Chris

Quuxplusone commented 17 years ago

When this happens, we can eliminate Type::canLosslesslyBitCastTo and probably
some other stuff only
used by -raise.

-Chris

Quuxplusone commented 17 years ago

Is there anything preventing this from being eliminated now?

Can we use the stats to determine how much it fires (if at all)?

What pieces does it implement that might still be useful to retain?

Quuxplusone commented 17 years ago

Nothing prevents it.  We just need someone to do an llvm-test run with an
without it to see if there are
any perf differences.

-Chris

Quuxplusone commented 17 years ago

or even just run SPEC2K + SPEC2K6.

-Chris

Quuxplusone commented 17 years ago

I will set up overnight performance test run.

Quuxplusone commented 17 years ago

I want this gone. :)

Quuxplusone commented 17 years ago

Attached report.nightly.csv (3870 bytes, text/plain): External Nightly Test Results With -raise

Quuxplusone commented 17 years ago

Attached report.nightly.txt (5736 bytes, text/plain): External Nightly Test Results Without -raise

Quuxplusone commented 17 years ago

Summary Of Performance Results (See the Spreadsheet)

Four SPEC test suites were compared: CINT2000, CFP2000, CINT2006, and CFP2006.

In general it looks positive to remove the -raise pass. The overall results
(sum of all tests) are as follows:

                            +--Compile-+   +----------Run Times--------+
ITEM       GCCAS   Bytecode LLC    JIT     GCC     CBE     LLC     JIT
Delta      151.05  818298   -9.07  16.2    14.66   20.81   46.40   57.15
Percent     6.49%   1.95%    -1.09% 3.09%   2.18%   3.14%   7.12%   4.04%

The "Delta" values shown are the difference in aggregate time between "with
-raise" and "without -raise" runs across all test cases.  Negative values mean
the "without -raise" value was larger. Positive values mean the "without -raise"
value was smaller. Because these values represent the subtraction of the
"without -raise" from the "with -raise" data sets, positive values favor removal
of -raise while negative values favor keeping -raise.

The "Percent" values are computed by dividing the "Delta" value by the "with
-raise" sum. This gives an approximation of the magnitude of change introduced
by removing the -raise pass.

To interpret this correctly, one would say: "On average, the -raise pass
increases bytecode volumes by nearly 2% and LLC runtimes by over 7%".

The only negative impact of removing -raise seems to be a 1.09% increase in the
LLC compilation time. This is rather surprising given that there is less work
done by llc without -raise.

Note that these results account for only one run of the nightly tests.
Additional samples would be welcome.

Quuxplusone commented 17 years ago

Attached PR1072.results.xls (45056 bytes, application/octet-stream): Performance Data Spreadsheet

Quuxplusone commented 17 years ago

Attached PR1072.results.xls (33280 bytes, application/octet-stream): Performance Data Spreadsheet

Quuxplusone commented 17 years ago

Chris,

Please let me know if this is sufficient evidence to support removal of the
-raise pass.

Reid

Quuxplusone commented 17 years ago

Yes, it looks like we are close to it.  Things worth investigating:

* What does -raise do on 401.bzip2 that shrinks the bc file 18K?  Once
understood, we can reimplement
this in some other pass.

* I have little confidence in the stabilities of these numbers, they seem very
noisy.  Can you see if the
254.gap/LLC, 254.gap/CBE, 255.vortex/CBE, and 256.bzip2/cbe slowdowns
reproduce?  If so, we should
investigate what is happening there too.

-Chris

Quuxplusone commented 17 years ago

> * What does -raise do on 401.bzip2 that shrinks the bc file 18K?  Once
> understood, we can reimplement this in some other pass.

I ran 401.bzip2 using "head" from today and compared it against the "without
-raise" run I did last night. The bytecode sizes are identical. Consequently, I
don't think I was comparing apples with apples. I used the results of my nightly
run from a couple days ago for the "with -raise" data but my "without -raise"
also included the shift patch, which may have affected optimization
(positively).

So, guess I get to do this again.

Devang, did you run this comparison on Darwin?

> * I have little confidence in the stabilities of these numbers, they seem very
>  noisy.

I eliminated as much interference workload as I could by shutting down email,
xchat, etc. and just let it run. I think the issue here is the difference in the
builds. I'll run the comparison again and report new numbers. Unfortunately,
it'll have to wait until this evening for "quiet machine".

> Can you see if the 254.gap/LLC, 254.gap/CBE, 255.vortex/CBE, and
> 256.bzip2/cbe slowdowns reproduce?  If so, we should investigate what is
> happening there too.

I'll defer this until after I get new numbers. If they still look problematic,
I'll investigate then.

Quuxplusone commented 17 years ago

Makes sense. A lot has changed in a few days. I suspect the 'noise' is really measuring these changes :)

-Chris

Quuxplusone commented 17 years ago

Yes. I now have two environments that were updated at the same moment. The only
difference is that one has -raise and one doesn't. This should be good for
testing. I'm going to do Olden with GET_STABLE_NUMBERS=1 because its relatively
quick and a good test for stability. If that goes okay, I'll get to External
tests this weekend (7+ hours each).

Quuxplusone commented 17 years ago

sounds great, thanks reid!

Quuxplusone commented 17 years ago

Attached Olden.results.xls (16896 bytes, application/octet-stream): Olden Performance Comparison

Quuxplusone commented 17 years ago

> there is 0 difference in bytecode sizes. Perhaps -raise
> doesn't do anything at all on Olden?

Cool.  You could diff them to verify.

This is a good sign -raise isn't doing anything any more. :)  SPEC will give
more certainty though.

Quuxplusone commented 17 years ago

[reid@bashful llvm-test-1]$ for f in
MultiSource/Benchmarks/Olden/*/Output/*.llvm.bc ; do
> cmp $f ../llvm-test-4/$f
> done
[reid@bashful llvm-test-1]$ echo $?
0

Looks pretty identical to me.

Quuxplusone commented 17 years ago

Attached External.results.xls (31744 bytes, application/octet-stream): External Performance Comparison (Spreadsheet)

Quuxplusone commented 17 years ago

Looking at the External test comparison, it doesn't look like -raise does much.

However a few things could be investigated:

255.vortex - Why does raise eliminate 733 bytes of bytecode?
433.milc   - What does raise do to make llc 1.9 seconds faster?
456.hmmer  - What does raise do to make llc 1.54 seconds fasters?
447.dealII - Why is "gccas" (opt) 2.93 seconds faster with -raise?
400.perlbench - What does raise do to make jit 1.04 seconds faster, especially
                with 1133 more bytecode?

Quuxplusone commented 17 years ago

We should generate comparisons on Darwin/PPC as well just to make sure that
-raise doesn't assist any backend opts on that platform.  I don't see why it
would, but we should check.

Quuxplusone commented 17 years ago

For 255.vortex, here's what I found:

1. The bytecode size difference is due to the -raise pass removing or
   altering the names of values. A diff of the llvm assembly corresponding to
   the .llvm.bc files from both cases produced 6000 lines of diffs. But, the
   only way they differed was in the names of values. I ran opt -strip on these
   files and re-ran the diff. It was vastly shorter. I'll attach it.

2. The diff produces in #1 showed that -raise will fold a GEP instruction into
   a store instruction using a GEP constant expression. Without -raise this
   doesn't happen. There is also some changes in the # of uses probably related
   to the GEP folding into the store.

Its not clear why -instcombine wouldn't do the same transform as #2.

Quuxplusone commented 17 years ago

Attached strip.diff (2665 bytes, text/plain): 255.vortex difference on stripped files

Quuxplusone commented 17 years ago

Attached 433.milc.diff (640 bytes, text/plain): 433.milc differences

Quuxplusone commented 17 years ago

Attached 456.hmmer.diff (32462 bytes, text/plain): 456.hmmer differences

Quuxplusone commented 17 years ago

Attached 447.dealII.diff (62181 bytes, text/plain): 447.dealII differences

Quuxplusone commented 17 years ago

For 400.perlbench:

Unfortunately, my machine doesn't have enough memory to do a diff of two 2.7GB
LLVM assembly files.

Quuxplusone commented 17 years ago

the 255.vortex differences aren't worth worrying about, we're good there.

Quuxplusone commented 17 years ago

ok, go ahead and nuke it.

Quuxplusone commented 17 years ago

The -raise pass was remvoed with these patches:

http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20070129/043817.html
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20070129/043818.html
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20070129/043819.html
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20070129/043820.html
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20070129/043821.html

Resolved.

Quuxplusone / LLVMBugzillaTest

Investigate elimination of the -raise pass #1072