Closed jay0lee closed 3 years ago
Thanks @jay0lee for opening this issue.
I'll also bring forward this useful comment from you on #188:
Also FWIW I did test taking the StaticX binary build on Ubuntu 20.04 and running it on Debian 9 and CentOS 7. Both worked which is good enough for me. For now at least I'll just generate my legacy Linux static build on 20.04.
Do you think you could update your project to capture dist/helloworld
(PyInstaller output) and dist/helloworld-staticx
(Staticx output)? It looks like this should be straightforward with actions/upload-artifact
. I'd like to dig in to the 18.04 binary to see exactly what it's loading and why it's segfaulting.
Thanks Jonathon,
Here's the compiled binaries:
https://drive.google.com/file/d/1vDya8-HeUGfqYcMp9UM5U2GTeQ1Y49I8/view?usp=sharing
and if you need to see the run that generated them it's at:
https://github.com/jay0lee/actions-hello-world/runs/3826354929?check_suite_focus=true
FYI: No need to upload to drive; they're available at the bottom of the workflow summary page.
Wow:
(gdb) r
Starting program: /wherever/helloworld-staticx
During startup program terminated with signal SIGSEGV, Segmentation fault.
I can't say I've ever seen that before. I'm not sure it will help but can you run staticx
with --debug
? Not only does it enable debug output for the builder, but it also switches to a debug version of the bootloader which might help in this case.
Here's a run with --debug
https://github.com/jay0lee/actions-hello-world/runs/3835505785?check_suite_focus=true
On Fri, Oct 8, 2021, 12:59 AM Jonathon Reinhart @.***> wrote:
FYI: No need to upload to drive; they're available at the bottom of the workflow summary page https://github.com/jay0lee/actions-hello-world/actions/runs/1315924714.
Wow:
(gdb) r Starting program: /wherever/helloworld-staticx During startup program terminated with signal SIGSEGV, Segmentation fault.
I can't say I've ever seen that before. I'm not sure it will help but can you run staticx with --debug? Not only does it enable debug output for the builder, but it also switches to a debug version of the bootloader which might help in this case.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/JonathonReinhart/staticx/issues/198#issuecomment-938344399, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABDIZMEAADLOXFILGLVHEWTUFZ3DFANCNFSM5FPLYFUA .
gdb
provides no useful information whatsoever because the program is crashing immediately. This makes me suspect something is wrong with the ELF file itself.
The python-ptrace project provides an alternate Python implementation of gdb
called gdb.py
which is actually more useful.
$ alias noaslr='setarch $(uname -m) -R'
$ noaslr gdb.py ./1319221677/helloworld-staticx
------------------------------------------------------------
PID: 87691
Signal: SIGSEGV
Invalid memory access to NULL
- mapping: NULL is not mapped in memory
------------------------------------------------------------
(gdb) regs
r15 = 0x0000000000000000
r14 = 0x00007ffff735b300
r13 = 0x00007ffff735b318
r12 = 0x00007ffff73649c0
rbp = 0x00007ffff786ebf0
rbx = 0x0000000000963ef0
r11 = 0x0000000000000202
r10 = 0xfffffffffffff28e
r9 = 0x0000000000000360
r8 = 0x00000000009076e0
rax = 0xffffffffffffffff
rcx = 0x00007ffff7cf26c7
rdx = 0x00000000009fa130
rsi = 0x00007ffff786ebf0
rdi = 0x00007ffff7361cd0
orig_rax = 0x000000000000003b
rip = 0x00007ffff7cf26c7
cs = 0x0000000000000033
eflags = 0x0000000000000202
rsp = 0x00007fffffffc698
ss = 0x000000000000002b
fs_base = 0x00007ffff7c24740
gs_base = 0x0000000000000000
ds = 0x0000000000000000
es = 0x0000000000000000
fs = 0x0000000000000000
gs = 0x0000000000000000
(gdb) maps
MAPS: 0x00007ffffffde000-0x00007ffffffff000 => [stack] (rw-p)
MAPS: 0xffffffffff600000-0xffffffffff601000 => [vsyscall] (r-xp)
It looks like the kernel is handing control over to the bootloader with nothing from the executable actually mapped.
At this point, I suspect that either:
patchelf
horribly mangled the bootloader while patching it
I released v0.13.2 which adds a bunch more logging at startup to identify the tools being used and their versions. I'd start by recommending that you do another run with staticx v0.13.2 and --debug
.
Then, I'd like to see the output of readelf -hlSW helloworld-staticx
. I can do that locally by grabbing your artifact. I'd also like to see the same for the bootloader embedded into your staticx package. If you're installing from a wheel, then I don't need that.
Sure, here's the latest build (still running as I post this). Notice pyinstaller and staticx are just pip installed normally so should be the wheel (see the "install pyinstaller and staticx" step)
Jay
On Sat, Oct 9, 2021, 11:54 AM Jonathon Reinhart @.***> wrote:
gdb provides no useful information whatsoever because the program is crashing immediately. This makes me suspect something is wrong with the ELF file itself.
The python-ptrace https://pypi.org/project/python-ptrace/ project provides an alternate Python implementation of gdb called gdb.py which is actually more useful.
$ alias noaslr='setarch $(uname -m) -R' $ noaslr gdb.py ./1319221677/helloworld-staticx
PID: 87691 Signal: SIGSEGV Invalid memory access to NULL
mapping: NULL is not mapped in memory
(gdb) regs r15 = 0x0000000000000000 r14 = 0x00007ffff735b300 r13 = 0x00007ffff735b318 r12 = 0x00007ffff73649c0 rbp = 0x00007ffff786ebf0 rbx = 0x0000000000963ef0 r11 = 0x0000000000000202 r10 = 0xfffffffffffff28e r9 = 0x0000000000000360 r8 = 0x00000000009076e0 rax = 0xffffffffffffffff rcx = 0x00007ffff7cf26c7 rdx = 0x00000000009fa130 rsi = 0x00007ffff786ebf0 rdi = 0x00007ffff7361cd0 orig_rax = 0x000000000000003b rip = 0x00007ffff7cf26c7 cs = 0x0000000000000033 eflags = 0x0000000000000202 rsp = 0x00007fffffffc698 ss = 0x000000000000002b fs_base = 0x00007ffff7c24740 gs_base = 0x0000000000000000 ds = 0x0000000000000000 es = 0x0000000000000000 fs = 0x0000000000000000 gs = 0x0000000000000000 (gdb) maps MAPS: 0x00007ffffffde000-0x00007ffffffff000 => [stack] (rw-p) MAPS: 0xffffffffff600000-0xffffffffff601000 => [vsyscall] (r-xp)
It looks like the kernel is handing control over to the bootloader with nothing from the executable actually mapped.
At this point, I suspect that either:
- Your compiler toolchain built a terribly incorrect version of the bootloader
- Assuming you installed staticx from source and not a wheel
- Your version of patchelf horribly mangled the bootloader while patching it
- This is my suspicion
I released v0.13.2 https://github.com/JonathonReinhart/staticx/releases/tag/v0.13.2 which adds a bunch more logging at startup to identify the tools being used and their versions. I'd start by recommending that you do another run with staticx v0.13.2 and --debug.
Then, I'd like to see the output of readelf -hlSW helloworld-staticx. I can do that locally by grabbing your artifact. I'd also like to see the same for the bootloader embedded into your staticx package. If you're installing from a wheel, then I don't need that.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/JonathonReinhart/staticx/issues/198#issuecomment-939318520, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABDIZMBCUELGATHH2L4LI2DUGBQTTANCNFSM5FPLYFUA .
from https://github.com/JonathonReinhart/staticx/issues/203 same issue, after fix https://github.com/JonathonReinhart/staticx/pull/200 by https://github.com/JonathonReinhart/staticx/pull/204 no more segmentation faults.
I build a CentOS image with the following Dockerfile:
Then I ran the following commands inside a container from that resulting image:
# pip install staticx==0.13.3
# staticx $(which date) date.sx
# ./date.sx
Segmentation fault
# sxpath=$(python3 -c 'import staticx; print(staticx.__path__[0])')
# ls -l date.sx $sxpath/assets/release/bootloader
-rwxr-xr-x 1 root root 127680 Oct 14 05:10 /usr/local/lib/python3.6/site-packages/staticx/assets/release/bootloader
-rwx------ 1 root root 932520 Oct 14 05:11 date.sx
# readelf -hlSW $sxpath/assets/release/bootloader > readelf_hlSW_bootloader
# readelf -hlSW date.sx > readelf_hlSW_date_sx
# diff -u readelf_hlSW_bootloader readelf_hlSW_date_sx
The resulting diff:
--- readelf_hlSW_bootloader 2021-10-14 05:16:25.484932981 +0000
+++ readelf_hlSW_date_sx 2021-10-14 05:16:43.277202489 +0000
@@ -10,14 +10,14 @@
Version: 0x1
Entry point address: 0x40157e
Start of program headers: 64 (bytes into file)
- Start of section headers: 126016 (bytes into file)
+ Start of section headers: 930792 (bytes into file)
Flags: 0x0
Size of this header: 64 (bytes)
Size of program headers: 56 (bytes)
Number of program headers: 7
Size of section headers: 64 (bytes)
- Number of section headers: 26
- Section header string table index: 25
+ Number of section headers: 27
+ Section header string table index: 26
Section Headers:
[Nr] Name Type Address Off Size ES Flg Lk Inf Al
@@ -44,9 +44,10 @@
[20] .debug_str PROGBITS 0000000000000000 0196d0 0001c2 01 MS 0 0 1
[21] .debug_loc PROGBITS 0000000000000000 019892 0000e6 00 0 0 1
[22] .debug_ranges PROGBITS 0000000000000000 019980 0000a0 00 0 0 16
- [23] .symtab SYMTAB 0000000000000000 019a20 003a08 18 24 252 8
- [24] .strtab STRTAB 0000000000000000 01d428 001721 00 0 0 1
- [25] .shstrtab STRTAB 0000000000000000 01eb49 0000f2 00 0 0 1
+ [23] .staticx.archive PROGBITS 0000000000000000 019a20 0c4780 00 0 0 1
+ [24] .symtab SYMTAB 0000000000000000 0de1a0 003a20 18 25 253 8
+ [25] .strtab STRTAB 0000000000000000 0e1bc0 001721 00 0 0 1
+ [26] .shstrtab STRTAB 0000000000000000 0e32e1 000103 00 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
L (link order), O (extra OS processing required), G (group), T (TLS),
@@ -55,7 +56,7 @@
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
- LOAD 0x000000 0x0000000000400000 0x0000000000400000 0x0001c8 0x0001c8 R 0x1000
+ LOAD 0x000000 0x0000000000000000 0x0000000000400000 0x0001c8 0x0001c8 R 0x1000
LOAD 0x001000 0x0000000000401000 0x0000000000401000 0x014d36 0x014d36 R E 0x1000
LOAD 0x016000 0x0000000000416000 0x0000000000416000 0x002420 0x002420 R 0x1000
LOAD 0x018f50 0x0000000000419f50 0x0000000000419f50 0x000378 0x001450 RW 0x1000
The changes are mostly as expected:
.staticx.archive
section was inserted:thinking: But the final change is surprising. The VirtAddr
of the first LOAD
program header changed from 0x400000
to 0x0
.
That program header change looks very suspicious and doesn't seem to be correct, given the changes that patchelf
needed to make.
I decided to try and "fix" the problem, by changing it back:
# cp date.sx date.sx.hack
# echo -en '\x00\x00\x40\x00' | dd of=date.sx.hack conv=notrunc bs=1 seek=$((0x50))
Verifying:
# diff <(readelf -l $sxpath/assets/release/bootloader) <(readelf -l date.sx.hack) >/dev/null && echo 'identical' || echo 'different'
identical
Testing:
# ./date.sx.hack
Thu Oct 14 05:58:08 UTC 2021
It worked! :tada:
But now to figure out why it's getting corrupted...
I was mistaken by blaming this on patchelf
. I actually use objcopy
for elf_add_section
.
compare_proghdrs.sh
#!/bin/bash
diff <(readelf -Wl "$1") <(readelf -Wl "$2") \
&& echo "Program headers are identical"
# objcopy --version
GNU objcopy version 2.27-44.base.el7
# cp $sxpath/assets/release/bootloader .
# cp bootloader bootloader.test2
# dd if=/dev/urandom of=dummy_128k bs=128k count=1
# objcopy --add-section '.dummy=dummy_128k' bootloader.test2
# ./bootloader.test2
Segmentation fault
# ./compare_proghdrs.sh bootloader bootloader.test2
8c8
< LOAD 0x000000 0x0000000000400000 0x0000000000400000 0x0001c8 0x0001c8 R 0x1000
---
> LOAD 0x000000 0x0000000000000000 0x0000000000400000 0x0001c8 0x0001c8 R 0x1000
Conclusion: Using objcopy --add-section
with a 128k file on CentOS 7 will mangle the program headers. :bug:
Same as Test 2, but on Debian bullseye:
$ objcopy --version
GNU objcopy (GNU Binutils for Debian) 2.35.2
$ cp bootloader bootloader.test3
$ objcopy --add-section '.dummy=dummy_128k' bootloader.test3
$ ./bootloader.test3
bootloader.test3: Failed to find .staticx.archive section (((this is success)))
$ ../compare_proghdrs.sh bootloader bootloader.test3
Program headers are identical
I tried to eliminate musl by building normally (scons
) on my bullseye host.
$ rm -rf build dist scons_build staticx/assets/*
$ python3 setup.py bdist_wheel
...
$ cd dist/
$ docker run --rm -it -v $(pwd):/dist -w /dist centos:7-python3
[root@f80262881b53 dist]# pip3 install staticx-*.whl
...
[root@f80262881b53 dist]# staticx $(which date) date.sx
[root@f80262881b53 dist]# ./date.sx
Thu Oct 14 07:27:15 UTC 2021
Conclusion: It's old objcopy + musl.
@jay0lee I'm going to close this issue in favor of #205, where the issue is now better-defined. Thanks for your help so far.
Any binaries I build with PyInstaller and StaticX are segfaulting when built on Ubuntu 18.04 and Github Actions hosted runners. Here's a PoC project:
https://github.com/jay0lee/actions-hello-world/runs/3807920931
it seems to be some specific Ubuntu 18.04 issue as the same steps build a valid binary on Ubuntu 20.04.
I suspect this is the same issue as when using
actions/setup-python
but can work to confirm that.