Open kmod opened 3 years ago
Thanks for reporting the issue.
The only known thing to not work with .so's is -split-eh
, but that should be turned off automatically and you will see a warning. -inline-all
can mess debug info, but likely to a limited extend. I would start with disabling all optimizations but code ordering and check if the binary works. I would check it myself, but I need to setup a virtual machine first.
Oh good idea, I removed all the command line flags:
$ llvm-bolt libpython3.8-pyston2.2d.so.1.0.prebolt -o libpython3.8-pyston2.2d.so.1.0
BOLT-INFO: shared object or position-independent executable detected
BOLT-INFO: Target architecture: x86_64
BOLT-INFO: BOLT version: 0c14e20238604a4c05e174e71676857d45c60a0f
BOLT-INFO: first alloc address is 0x0
BOLT-INFO: creating new program header table at address 0x600000, offset 0x600000
BOLT-WARNING: debug info will be stripped from the binary. Use -update-debug-sections to keep it.
BOLT-INFO: enabling relocation mode
BOLT-INFO: enabling -align-macro-fusion=all since no profile was specified
BOLT-INFO: enabling lite mode
BOLT-INFO: forcing -jump-tables=move as PIC jump table was detected in function _PyEval_EvalFrameDefault
BOLT-INFO: 0 out of 7274 functions in the binary (0.0%) have non-empty execution profile
BOLT-INFO: the input contains 831 (dynamic count : 0) opportunities for macro-fusion optimization that are going to be fixed
BOLT-INFO: UCE removed 0 blocks and 0 bytes of code.
BOLT-INFO: SCTC: patched 2 tail calls (2 forward) tail calls (0 backward) from a total of 2 while removing 0 double jumps and removing 2 basic blocks totalling 10 bytes of code. CTCs total execution count is 0 and the number of times CTCs are taken is 0.
BOLT-INFO: patched build-id (flipped last bit)
And the result still crashes:
$ ./python3 -c '1'
python3: ../../../Objects/dictobject.c:883: lookdict_unicode_nodummy: Assertion `ix != DKIX_DUMMY' failed.
It also crashes if I still pass the profile file.
Thanks for trying that. I will take a look.
When I pass the --update-debug-sections
flag and no other flags, the source locations are correct now, but there are still a couple bad frames in the gdb backtrace. I believe that one of the two functions in question is _PyEval_EvalFrameDefault
, which was mentioned during the bolt run as being notable for having a PIC jump table, in case that's helpful.
There is an issue with what looks like a computed goto in _PyEval_EvalFrameDefault
. I suspect the effect is limited to just this function (interpreter loop?), so you can try to disable its optimization with -skip-funcs=_PyEval_EvalFrameDefault
while I think of a proper solution.
That didn't quite do it, but after skipping every function mentioned by BOLT-INFO: forcing -jump-tables=move as PIC jump table was detected in function XXX
I got things working.
Just in case it's relevant, we compile _PyEval_EvalFrameDefault with -Os
That's good to know. Although, it's quite unexpected. You can also disable processing functions with jump tables using -jump-tables=none
option.
We've been using bolt successfully on our binary, but when we compile our program with
-fPIC
and link as a shared object and apply bolt to it, the result doesn't work correctly. I'm not exactly sure what's going on but the two things I've noticed are:I assume these are related and imply that we didn't get good output from bolt, but I can't be sure.
Is there anything different we should be doing for optimizing a shared object / PIC code?
Here's how we produced the files:
Here are the files, let me know if there's any other info that I could provide that would be helpful.