Open mdboom opened 1 year ago
Locally (Debian bullseye, with the llvm packages from here), I seem to be segfaulting llvm-bolt
:
#0 0x000055c63a2ba081 (/usr/bin/llvm-bolt+0x1adc081)
#1 0x000055c63a2b7f1c (/usr/bin/llvm-bolt+0x1ad9f1c)
#2 0x000055c63a2ba596 (/usr/bin/llvm-bolt+0x1adc596)
#3 0x00007ff0ed3e7f90 (/lib/x86_64-linux-gnu/libc.so.6+0x3bf90)
#4 0x000055c63b001740 (/usr/bin/llvm-bolt+0x2823740)
#5 0x000055c63a365be0 (/usr/bin/llvm-bolt+0x1b87be0)
#6 0x000055c63a364413 (/usr/bin/llvm-bolt+0x1b86413)
#7 0x000055c63a35fcfd (/usr/bin/llvm-bolt+0x1b81cfd)
#8 0x000055c63a35f139 (/usr/bin/llvm-bolt+0x1b81139)
#9 0x000055c63a30634b (/usr/bin/llvm-bolt+0x1b2834b)
#10 0x000055c63a2fe33b (/usr/bin/llvm-bolt+0x1b2033b)
#11 0x000055c639021ba2 (/usr/bin/llvm-bolt+0x843ba2)
#12 0x00007ff0ed3d318a __libc_start_call_main ./csu/../sysdeps/nptl/libc_start_call_main.h:74:3
#13 0x00007ff0ed3d3245 call_init ./csu/../csu/libc-start.c:128:20
#14 0x00007ff0ed3d3245 __libc_start_main ./csu/../csu/libc-start.c:368:5
#15 0x000055c63901fcd1 (/usr/bin/llvm-bolt+0x841cd1)
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0. Program arguments: /usr/bin/llvm-bolt ./python -o python.bolt -data=python.fdata -update-debug-sections -reorder-blocks=ext-tsp -reorder-functions=hfsort+ -split-functions -icf=1 -inline-all -split-eh -reorder-functions-use-hot-size -peepholes=none -jump-tables=aggressive -inline-ap -indirect-call-promotion=all -dyno-stats -use-gnu-stack -frame-opt=hot
Segmentation fault
Will try this on our benchmarking machines (with different versions of Ubuntu), to see if I have more luck.
Locally (Debian bullseye, with the llvm packages from here), I seem to be segfaulting llvm-bolt:
Which LLVM version are you using?
Locally (Debian bullseye, with the llvm packages from here), I seem to be segfaulting llvm-bolt:
Which LLVM version are you using?
15.0.7. Should I try a more recent one?
15.0.7. Should I try a more recent one?
Yes, please use at least 16.0.0+ Here is my experimentation: https://docs.google.com/presentation/d/1YTZfgaS9yqUDoIg1wryJuEdtB0ZHaDBJ_j2CK7GK5aM (page 11 - Analysis Environment)
Thanks for the pointer. LLVM 16.0.3 produces something that works.
Nice, and please take a look at https://github.com/faster-cpython/ideas/issues/551#issuecomment-1536410741 for your experimentation.
IIUC, for stabilizing the benchmark result, we should train(?) the binary by running pyperformance benchmark(and this is what pyston team did originally), and I expect that it will reduce the noise by reducing the l1 cache miss ratio.
Here's the results of an A/A test of a recent CPython commit (45a9e3)
build | min | 10%-ile | mean | 90%-ile | max |
---|---|---|---|---|---|
no BOLT | 0.87 | 0.97 | 1.00 | 1.03 | 1.18 |
BOLT | 0.91 | 0.97 | 1.00 | 1.02 | 1.11 |
So, there is in fact a little less variability in the "long tail" with BOLT than non-BOLT. This is a little surprising and counter-intuitive. However, looking at the 10/90%-iles, it's almost identical, so it's not an obvious, easy win.
We'd probably see less variability by re-using the profiling data for BOLT between runs, but it's not clear how transferable those would be in the general case between builds with important changes in the source code. (Same reason we don't do that for PGO either).
This is totally unrelated to the original purpose of this PR, but anticipating the question, these are the results on our benchmarking hardware of BOLT vs. non-BOLT, an approx 2% speedup.
@itamaro suggested that using BOLT may reduce benchmarking variability. We should run the A/A tests in this mode to see if it helps.