JonathonReinhart / staticx

Create static executable from dynamic executable
https://staticx.readthedocs.io/
Other
345 stars 37 forks source link

staticx generated binary segfaults when built on Ubuntu 18.04 Github hosted actions runner #198

Closed jay0lee closed 3 years ago

jay0lee commented 3 years ago

Any binaries I build with PyInstaller and StaticX are segfaulting when built on Ubuntu 18.04 and Github Actions hosted runners. Here's a PoC project:

https://github.com/jay0lee/actions-hello-world/runs/3807920931

it seems to be some specific Ubuntu 18.04 issue as the same steps build a valid binary on Ubuntu 20.04.

I suspect this is the same issue as when using actions/setup-python but can work to confirm that.

JonathonReinhart commented 3 years ago

Thanks @jay0lee for opening this issue.

I'll also bring forward this useful comment from you on #188:

Also FWIW I did test taking the StaticX binary build on Ubuntu 20.04 and running it on Debian 9 and CentOS 7. Both worked which is good enough for me. For now at least I'll just generate my legacy Linux static build on 20.04.

Do you think you could update your project to capture dist/helloworld (PyInstaller output) and dist/helloworld-staticx (Staticx output)? It looks like this should be straightforward with actions/upload-artifact. I'd like to dig in to the 18.04 binary to see exactly what it's loading and why it's segfaulting.

jay0lee commented 3 years ago

Thanks Jonathon,

Here's the compiled binaries:

https://drive.google.com/file/d/1vDya8-HeUGfqYcMp9UM5U2GTeQ1Y49I8/view?usp=sharing

and if you need to see the run that generated them it's at:

https://github.com/jay0lee/actions-hello-world/runs/3826354929?check_suite_focus=true

JonathonReinhart commented 3 years ago

FYI: No need to upload to drive; they're available at the bottom of the workflow summary page.

Wow:

(gdb) r
Starting program: /wherever/helloworld-staticx 
During startup program terminated with signal SIGSEGV, Segmentation fault.

I can't say I've ever seen that before. I'm not sure it will help but can you run staticx with --debug? Not only does it enable debug output for the builder, but it also switches to a debug version of the bootloader which might help in this case.

jay0lee commented 3 years ago

Here's a run with --debug

https://github.com/jay0lee/actions-hello-world/runs/3835505785?check_suite_focus=true

On Fri, Oct 8, 2021, 12:59 AM Jonathon Reinhart @.***> wrote:

FYI: No need to upload to drive; they're available at the bottom of the workflow summary page https://github.com/jay0lee/actions-hello-world/actions/runs/1315924714.

Wow:

(gdb) r Starting program: /wherever/helloworld-staticx During startup program terminated with signal SIGSEGV, Segmentation fault.

I can't say I've ever seen that before. I'm not sure it will help but can you run staticx with --debug? Not only does it enable debug output for the builder, but it also switches to a debug version of the bootloader which might help in this case.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/JonathonReinhart/staticx/issues/198#issuecomment-938344399, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABDIZMEAADLOXFILGLVHEWTUFZ3DFANCNFSM5FPLYFUA .

JonathonReinhart commented 3 years ago

gdb provides no useful information whatsoever because the program is crashing immediately. This makes me suspect something is wrong with the ELF file itself.


The python-ptrace project provides an alternate Python implementation of gdb called gdb.py which is actually more useful.

$ alias noaslr='setarch $(uname -m) -R'
$ noaslr gdb.py ./1319221677/helloworld-staticx
------------------------------------------------------------
PID: 87691
Signal: SIGSEGV
Invalid memory access to NULL
- mapping: NULL is not mapped in memory
------------------------------------------------------------
(gdb) regs
     r15 = 0x0000000000000000
     r14 = 0x00007ffff735b300
     r13 = 0x00007ffff735b318
     r12 = 0x00007ffff73649c0
     rbp = 0x00007ffff786ebf0
     rbx = 0x0000000000963ef0
     r11 = 0x0000000000000202
     r10 = 0xfffffffffffff28e
      r9 = 0x0000000000000360
      r8 = 0x00000000009076e0
     rax = 0xffffffffffffffff
     rcx = 0x00007ffff7cf26c7
     rdx = 0x00000000009fa130
     rsi = 0x00007ffff786ebf0
     rdi = 0x00007ffff7361cd0
orig_rax = 0x000000000000003b
     rip = 0x00007ffff7cf26c7
      cs = 0x0000000000000033
  eflags = 0x0000000000000202
     rsp = 0x00007fffffffc698
      ss = 0x000000000000002b
 fs_base = 0x00007ffff7c24740
 gs_base = 0x0000000000000000
      ds = 0x0000000000000000
      es = 0x0000000000000000
      fs = 0x0000000000000000
      gs = 0x0000000000000000
(gdb) maps
MAPS: 0x00007ffffffde000-0x00007ffffffff000 => [stack] (rw-p)
MAPS: 0xffffffffff600000-0xffffffffff601000 => [vsyscall] (r-xp)

It looks like the kernel is handing control over to the bootloader with nothing from the executable actually mapped.


At this point, I suspect that either:

I released v0.13.2 which adds a bunch more logging at startup to identify the tools being used and their versions. I'd start by recommending that you do another run with staticx v0.13.2 and --debug.

Then, I'd like to see the output of readelf -hlSW helloworld-staticx. I can do that locally by grabbing your artifact. I'd also like to see the same for the bootloader embedded into your staticx package. If you're installing from a wheel, then I don't need that.

jay0lee commented 3 years ago

Sure, here's the latest build (still running as I post this). Notice pyinstaller and staticx are just pip installed normally so should be the wheel (see the "install pyinstaller and staticx" step)

Jay

On Sat, Oct 9, 2021, 11:54 AM Jonathon Reinhart @.***> wrote:

gdb provides no useful information whatsoever because the program is crashing immediately. This makes me suspect something is wrong with the ELF file itself.

The python-ptrace https://pypi.org/project/python-ptrace/ project provides an alternate Python implementation of gdb called gdb.py which is actually more useful.

$ alias noaslr='setarch $(uname -m) -R' $ noaslr gdb.py ./1319221677/helloworld-staticx

PID: 87691 Signal: SIGSEGV Invalid memory access to NULL

  • mapping: NULL is not mapped in memory

    (gdb) regs r15 = 0x0000000000000000 r14 = 0x00007ffff735b300 r13 = 0x00007ffff735b318 r12 = 0x00007ffff73649c0 rbp = 0x00007ffff786ebf0 rbx = 0x0000000000963ef0 r11 = 0x0000000000000202 r10 = 0xfffffffffffff28e r9 = 0x0000000000000360 r8 = 0x00000000009076e0 rax = 0xffffffffffffffff rcx = 0x00007ffff7cf26c7 rdx = 0x00000000009fa130 rsi = 0x00007ffff786ebf0 rdi = 0x00007ffff7361cd0 orig_rax = 0x000000000000003b rip = 0x00007ffff7cf26c7 cs = 0x0000000000000033 eflags = 0x0000000000000202 rsp = 0x00007fffffffc698 ss = 0x000000000000002b fs_base = 0x00007ffff7c24740 gs_base = 0x0000000000000000 ds = 0x0000000000000000 es = 0x0000000000000000 fs = 0x0000000000000000 gs = 0x0000000000000000 (gdb) maps MAPS: 0x00007ffffffde000-0x00007ffffffff000 => [stack] (rw-p) MAPS: 0xffffffffff600000-0xffffffffff601000 => [vsyscall] (r-xp)

It looks like the kernel is handing control over to the bootloader with nothing from the executable actually mapped.

At this point, I suspect that either:

  • Your compiler toolchain built a terribly incorrect version of the bootloader
    • Assuming you installed staticx from source and not a wheel
  • Your version of patchelf horribly mangled the bootloader while patching it
    • This is my suspicion

I released v0.13.2 https://github.com/JonathonReinhart/staticx/releases/tag/v0.13.2 which adds a bunch more logging at startup to identify the tools being used and their versions. I'd start by recommending that you do another run with staticx v0.13.2 and --debug.

Then, I'd like to see the output of readelf -hlSW helloworld-staticx. I can do that locally by grabbing your artifact. I'd also like to see the same for the bootloader embedded into your staticx package. If you're installing from a wheel, then I don't need that.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/JonathonReinhart/staticx/issues/198#issuecomment-939318520, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABDIZMBCUELGATHH2L4LI2DUGBQTTANCNFSM5FPLYFUA .

jay0lee commented 3 years ago

https://github.com/jay0lee/actions-hello-world/runs/3847285571?check_suite_focus=true

sansna commented 3 years ago

from https://github.com/JonathonReinhart/staticx/issues/203 same issue, after fix https://github.com/JonathonReinhart/staticx/pull/200 by https://github.com/JonathonReinhart/staticx/pull/204 no more segmentation faults.

JonathonReinhart commented 3 years ago

CentOS 7 test 1

I build a CentOS image with the following Dockerfile:

```Dockerfile FROM centos:7 # Enable EPEL RUN yum install -y epel-release && rm -rf /var/cache/yum # Install main packages RUN yum install -y \ patchelf \ python3 \ python3-pip \ python3-wheel \ which \ && rm -rf /var/cache/yum # Upgrade pip RUN pip3 install --upgrade pip # Install our dependencies ADD requirements.txt /tmp/requirements.txt RUN pip3 install -r /tmp/requirements.txt ```

Then I ran the following commands inside a container from that resulting image:

# pip install staticx==0.13.3

# staticx $(which date) date.sx
# ./date.sx 
Segmentation fault

# sxpath=$(python3 -c 'import staticx; print(staticx.__path__[0])')

# ls -l date.sx $sxpath/assets/release/bootloader 
-rwxr-xr-x 1 root root 127680 Oct 14 05:10 /usr/local/lib/python3.6/site-packages/staticx/assets/release/bootloader
-rwx------ 1 root root 932520 Oct 14 05:11 date.sx

# readelf -hlSW $sxpath/assets/release/bootloader > readelf_hlSW_bootloader
# readelf -hlSW date.sx > readelf_hlSW_date_sx
# diff -u readelf_hlSW_bootloader readelf_hlSW_date_sx 

The resulting diff:

--- readelf_hlSW_bootloader 2021-10-14 05:16:25.484932981 +0000
+++ readelf_hlSW_date_sx    2021-10-14 05:16:43.277202489 +0000
@@ -10,14 +10,14 @@
   Version:                           0x1
   Entry point address:               0x40157e
   Start of program headers:          64 (bytes into file)
-  Start of section headers:          126016 (bytes into file)
+  Start of section headers:          930792 (bytes into file)
   Flags:                             0x0
   Size of this header:               64 (bytes)
   Size of program headers:           56 (bytes)
   Number of program headers:         7
   Size of section headers:           64 (bytes)
-  Number of section headers:         26
-  Section header string table index: 25
+  Number of section headers:         27
+  Section header string table index: 26

 Section Headers:
   [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
@@ -44,9 +44,10 @@
   [20] .debug_str        PROGBITS        0000000000000000 0196d0 0001c2 01  MS  0   0  1
   [21] .debug_loc        PROGBITS        0000000000000000 019892 0000e6 00      0   0  1
   [22] .debug_ranges     PROGBITS        0000000000000000 019980 0000a0 00      0   0 16
-  [23] .symtab           SYMTAB          0000000000000000 019a20 003a08 18     24 252  8
-  [24] .strtab           STRTAB          0000000000000000 01d428 001721 00      0   0  1
-  [25] .shstrtab         STRTAB          0000000000000000 01eb49 0000f2 00      0   0  1
+  [23] .staticx.archive  PROGBITS        0000000000000000 019a20 0c4780 00      0   0  1
+  [24] .symtab           SYMTAB          0000000000000000 0de1a0 003a20 18     25 253  8
+  [25] .strtab           STRTAB          0000000000000000 0e1bc0 001721 00      0   0  1
+  [26] .shstrtab         STRTAB          0000000000000000 0e32e1 000103 00      0   0  1
 Key to Flags:
   W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
   L (link order), O (extra OS processing required), G (group), T (TLS),
@@ -55,7 +56,7 @@

 Program Headers:
   Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
-  LOAD           0x000000 0x0000000000400000 0x0000000000400000 0x0001c8 0x0001c8 R   0x1000
+  LOAD           0x000000 0x0000000000000000 0x0000000000400000 0x0001c8 0x0001c8 R   0x1000
   LOAD           0x001000 0x0000000000401000 0x0000000000401000 0x014d36 0x014d36 R E 0x1000
   LOAD           0x016000 0x0000000000416000 0x0000000000416000 0x002420 0x002420 R   0x1000
   LOAD           0x018f50 0x0000000000419f50 0x0000000000419f50 0x000378 0x001450 RW  0x1000

The changes are mostly as expected:

:thinking: But the final change is surprising. The VirtAddr of the first LOAD program header changed from 0x400000 to 0x0.

JonathonReinhart commented 3 years ago

That program header change looks very suspicious and doesn't seem to be correct, given the changes that patchelf needed to make.

I decided to try and "fix" the problem, by changing it back:

# cp date.sx date.sx.hack
# echo -en '\x00\x00\x40\x00' | dd of=date.sx.hack conv=notrunc bs=1 seek=$((0x50))

Verifying:

# diff <(readelf -l $sxpath/assets/release/bootloader) <(readelf -l date.sx.hack) >/dev/null && echo 'identical' || echo 'different'
identical

Testing:

# ./date.sx.hack 
Thu Oct 14 05:58:08 UTC 2021

It worked! :tada:

But now to figure out why it's getting corrupted...

JonathonReinhart commented 3 years ago

I was mistaken by blaming this on patchelf. I actually use objcopy for elf_add_section.

compare_proghdrs.sh

#!/bin/bash
diff <(readelf -Wl "$1") <(readelf -Wl "$2") \
    && echo "Program headers are identical"

CentOS 7 Test 2

# objcopy --version
GNU objcopy version 2.27-44.base.el7
# cp $sxpath/assets/release/bootloader .
# cp bootloader bootloader.test2
# dd if=/dev/urandom of=dummy_128k bs=128k count=1
# objcopy --add-section '.dummy=dummy_128k' bootloader.test2 
# ./bootloader.test2
Segmentation fault
# ./compare_proghdrs.sh bootloader bootloader.test2 
8c8
<   LOAD           0x000000 0x0000000000400000 0x0000000000400000 0x0001c8 0x0001c8 R   0x1000
---
>   LOAD           0x000000 0x0000000000000000 0x0000000000400000 0x0001c8 0x0001c8 R   0x1000

Conclusion: Using objcopy --add-section with a 128k file on CentOS 7 will mangle the program headers. :bug:

JonathonReinhart commented 3 years ago

Test 3

Same as Test 2, but on Debian bullseye:

$ objcopy --version
GNU objcopy (GNU Binutils for Debian) 2.35.2
$ cp bootloader bootloader.test3
$ objcopy --add-section '.dummy=dummy_128k' bootloader.test3
$ ./bootloader.test3 
bootloader.test3: Failed to find .staticx.archive section      (((this is success)))
$ ../compare_proghdrs.sh bootloader bootloader.test3 
Program headers are identical
JonathonReinhart commented 3 years ago

Test 4

I tried to eliminate musl by building normally (scons) on my bullseye host.

$ rm -rf build dist scons_build staticx/assets/*
$ python3 setup.py bdist_wheel
...
$ cd dist/
$ docker run --rm -it -v $(pwd):/dist -w /dist centos:7-python3
[root@f80262881b53 dist]# pip3 install staticx-*.whl 
...
[root@f80262881b53 dist]# staticx $(which date) date.sx
[root@f80262881b53 dist]# ./date.sx 
Thu Oct 14 07:27:15 UTC 2021

Conclusion: It's old objcopy + musl.

JonathonReinhart commented 3 years ago

@jay0lee I'm going to close this issue in favor of #205, where the issue is now better-defined. Thanks for your help so far.