Major regression in startup time

GoogleCodeExporter commented 9 years ago

2009Q1 vs trunk@r669 (-j never):
perf.py options: --args ",-j never"

normal_startup:
Min: 0.377 -> 0.438: 13.94% slower
Avg: 0.396 -> 0.459: 13.65% slower
Significant (t=-22.757922, a=0.95)
Stddev: 0.007 -> 0.027: 72.96% larger

startup_nosite:
Min: 0.106 -> 0.203: 47.75% slower
Avg: 0.107 -> 0.205: 47.51% slower
Significant (t=-493.209565, a=0.95)
Stddev: 0.002 -> 0.002: 2.58% larger

This is probably due to having to load the LLVM libraries. Of course, start 
up time with "-j always" can be described as "crippling"; it would be nice to 
fix that, too, possibly by disabling LLVM-compilation under after startup 
completes.

Original issue reported on code.google.com by collinw on 24 Jun 2009 at 11:27

GoogleCodeExporter commented 9 years ago

Original comment by collinw on 25 Jun 2009 at 5:14

Changed state: Started

GoogleCodeExporter commented 9 years ago

r675 improves the situation for normal_startup, but doesn't address 
startup_nosite:

trunk vs patch (default options; perf.py -r)
normal_startup:
Min: 0.435643 -> 0.319087: 36.53% faster
Avg: 0.459272 -> 0.331541: 38.53% faster
Significant (t=51.009351, a=0.95)

2009Q1 vs patch (default options; perf.py -r)
normal_startup:
Min: 0.381866 -> 0.317074: 20.43% faster
Avg: 0.402331 -> 0.331774: 21.27% faster
Significant (t=20.677263, a=0.95)

trunk vs patch (-O2 -j always; n=5)
normal_startup:
Min: 15.735776 -> 11.112612: 41.60% faster
Avg: 15.795877 -> 11.144470: 41.74% faster
Significant (t=151.105242, a=0.95)

Original comment by collinw on 25 Jun 2009 at 6:49

GoogleCodeExporter commented 9 years ago

r676 improves the situation further for normal_startup:

trunk vs patch
normal_startup:
Min: 0.320101 -> 0.284638: 12.46% faster
Avg: 0.333535 -> 0.290565: 14.79% faster
Significant (t=13.574358, a=0.95)
Stddev: 0.03129 -> 0.00478: 553.99% smaller

2009Q1 vs patch
normal_startup:
Min: 0.380046 -> 0.284509: 33.58% faster
Avg: 0.402265 -> 0.289813: 38.80% faster
Significant (t=38.399910, a=0.95)
Stddev: 0.02344 -> 0.01756: 33.47% smaller

trunk vs patch (-j always -O2; n=5)
normal_startup:
Min: 11.127711 -> 5.712456: 94.80% faster
Avg: 11.146033 -> 5.722734: 94.77% faster
Significant (t=923.701375, a=0.95)
Stddev: 0.01110 -> 0.00701: 58.42% smaller

I'm moving the startup_nosite problem to Q3, now that normal_startup is faster.

Original comment by collinw on 27 Jun 2009 at 2:09

Changed state: Accepted
Added labels: Release-2009Q3
Removed labels: Release-2009Q2

GoogleCodeExporter commented 9 years ago

Original comment by collinw on 29 Sep 2009 at 5:57

Added labels: Release-2009Q4

GoogleCodeExporter commented 9 years ago

Original comment by collinw on 6 Jan 2010 at 11:44

Added labels: Release-Merger
Removed labels: Release-2009Q4

GoogleCodeExporter commented 9 years ago

Current status of startup time. These results are from installed versions of 
CPython 2.6.4 vs Unladen trunk 
(r988):

### normal_startup ###
Min: 0.219186 -> 0.352075: 1.6063x slower
Avg: 0.227228 -> 0.364384: 1.6036x slower
Significant (t=-51.879098, a=0.95)
Stddev: 0.00762 -> 0.02532: 3.3227x larger
Timeline: http://tinyurl.com/yfe8z3r

### startup_nosite ###
Min: 0.105949 -> 0.264912: 2.5004x slower
Avg: 0.107574 -> 0.267505: 2.4867x slower
Significant (t=-703.557403, a=0.95)
Stddev: 0.00214 -> 0.00240: 1.1209x larger
Timeline: http://tinyurl.com/yajn8fa

Note that if you run from the build directory, the results are different. This 
is due to the nature of the 
optimizations implemented in r676.

### normal_startup ###
Min: 0.321331 -> 0.357681: 1.1131x slower
Avg: 0.331351 -> 0.365976: 1.1045x slower
Significant (t=-14.231352, a=0.95)
Stddev: 0.00706 -> 0.02328: 3.2995x larger
Timeline: http://tinyurl.com/yz6hlj4

### startup_nosite ###
Min: 0.107814 -> 0.266355: 2.4705x slower
Avg: 0.109281 -> 0.267502: 2.4478x slower
Significant (t=-922.873066, a=0.95)
Stddev: 0.00168 -> 0.00175: 1.0446x larger
Timeline: http://tinyurl.com/yk9qj65

We need to optimize startup time again before final merger.

Original comment by collinw on 14 Jan 2010 at 10:24

GoogleCodeExporter commented 9 years ago

Original comment by collinw on 14 Jan 2010 at 11:59

Added labels: Priority-High
Removed labels: Priority-Medium

GoogleCodeExporter commented 9 years ago

r1019 added startup benchmarks based on Mercurial and Bazaar.

Original comment by collinw on 21 Jan 2010 at 12:24

GoogleCodeExporter commented 9 years ago

Further ideas that have come up recently (not mine):
- Disable JIT compilation during startup, restore the Py_JitControl setting 
before 
entering user code.
- Disable collection of runtime feedback data for cold functions using a 
multi-phase 
hotness model.

Original comment by collinw on 21 Jan 2010 at 10:03

GoogleCodeExporter commented 9 years ago

Comparing installed versions of CPython 2.6.4 vs Unladen Swallow r1024 using 
the new {hg,bzr}_startup benchmarks:

./perf.py -r -b hg_startup,bzr_startup /tmp/cpy26/bin/python 
/tmp/unladen/bin/python

### bzr_startup ###
Min: 0.067990 -> 0.097985: 1.4412x slower
Avg: 0.084322 -> 0.111348: 1.3205x slower
Significant (t=-37.432534, a=0.95)
Stddev: 0.00793 -> 0.00643: 1.2330x smaller
Timeline: http://tinyurl.com/ybdm537

### hg_startup ###
Min: 0.016997 -> 0.024997: 1.4707x slower
Avg: 0.026990 -> 0.036772: 1.3625x slower
Significant (t=-53.104502, a=0.95)
Stddev: 0.00406 -> 0.00417: 1.0273x larger
Timeline: http://tinyurl.com/ycout8m

Original comment by collinw on 21 Jan 2010 at 10:22

GoogleCodeExporter commented 9 years ago

Benchmarking r1027 with `-j never` against `-j whenhot`. This gauges the 
potential impact of the "disable JIT compilation 
during startup" idea:

./perf.py -r -b normal_startup --args ",-j never" /tmp/unladen/bin/python 
/tmp/unladen/bin/python

### normal_startup ###
Min: 0.355344 -> 0.346220: 1.0264x faster
Avg: 0.362940 -> 0.353413: 1.0270x faster
Significant (t=4.751362, a=0.95)
Stddev: 0.01925 -> 0.00560: 3.4367x smaller
Timeline: http://tinyurl.com/yc24xqh

Original comment by collinw on 22 Jan 2010 at 2:51

GoogleCodeExporter commented 9 years ago

r1045 improves startup time slightly by disabling JIT compilation and feedback 
gathering 
during startup. Any two-phase hotness model needs to include this same effect: 
during 
startup, no feedback is collected. This patch is intended as a naive baseline 
to be 
improved upon.

$ ./perf.py -r -b startup ../a/python ../b/python

### bzr_startup ###
Min: 0.091986 -> 0.089986: 1.0222x faster
Avg: 0.106514 -> 0.104694: 1.0174x faster
Significant (t=2.116822, a=0.95)
Stddev: 0.00829 -> 0.00889: 1.0730x larger
Timeline: http://tinyurl.com/yao44mw

### hg_startup ###
Min: 0.025996 -> 0.022997: 1.1304x faster
Avg: 0.037777 -> 0.036090: 1.0468x faster
Significant (t=9.174211, a=0.95)
Stddev: 0.00420 -> 0.00403: 1.0420x smaller
Timeline: http://tinyurl.com/yc7f6cc

### normal_startup ###
Min: 0.344597 -> 0.330059: 1.0440x faster
Avg: 0.358112 -> 0.343796: 1.0416x faster
Significant (t=3.619648, a=0.95)
Stddev: 0.01967 -> 0.03432: 1.7449x larger
Timeline: http://tinyurl.com/ykkkl2d

### startup_nosite ###
Min: 0.250642 -> 0.246434: 1.0171x faster
Avg: 0.252985 -> 0.248573: 1.0178x faster
Significant (t=20.164605, a=0.95)
Stddev: 0.00233 -> 0.00204: 1.1397x smaller
Timeline: http://tinyurl.com/ybyj8px

Original comment by collinw on 27 Jan 2010 at 11:14

Daetalus / unladen-swallow

Major regression in startup time #64