NixOS / patchelf

A small utility to modify the dynamic linker and RPATH of ELF executables
GNU General Public License v3.0
3.57k stars 487 forks source link

0.17.0: Segmentation fault after modifying RPATH #446

Open bastimeyer opened 1 year ago

bastimeyer commented 1 year ago

Describe the bug

Just encountered an issue with patchelf 0.17.0...

I build AppImages for my Python application, and in order to do that I'm using Python's official manylinux docker images where I copy one of the pre-built python environments and modify the RPATHs of all binaries and their dependencies using patchelf. The RPATHs get set to $ORIGIN (and other relative paths), so that when the AppImage's squashfs gets mounted on the user's system upon execution, Python can properly be run on unknown/arbitrary mount points. So far, this has all been working flawlessly.

The patchelf 0.17.0 upgrade however has introduced segmentation faults of modified executables after modifying their RPATH. Patchelf 0.16.1 was working fine.

When looking at the recent git commit history, there was a big change in 2cb863fb756d5005f7276ad7956e01079bffce46 in regards to the ELF header file with lots of changed constants. I have 0% knowledge of any internals here and how ELF files and dynamic linking works, but that's what stood out to me.

Steps To Reproduce

Here's a short BASH script for reproducing the issue in two manylinux docker containers. One with patchelf 0.16.1 and the next one right after patchelf was upgraded to 0.17.0. There are other changes included between those two image versions, but those are unrelated and the issue can also be reproduced by simply building patchelf 0.17.0 on the older image.

#!/usr/bin/env bash

IMAGES=(
  # 2022-11-14 - patchelf 0.16.1
  quay.io/pypa/manylinux2014_x86_64@sha256:005826a6fa94c97bd31fccf637a0f10621304da447ca2ab3963c13991dffa013
  # 2022-11-19 - patchelf 0.17.0
  quay.io/pypa/manylinux2014_x86_64@sha256:383c6016156c94d7dbd10696c15f2444288b99a25927239b7b024e1cc6ca6a81
)

SCRIPT=$(cat <<'EOF'
PYTHON=/opt/python/cp310-cp310/bin/python

patchelf --version
$PYTHON --version

rpath=$(patchelf --print-rpath $PYTHON)
echo "RPATH: $rpath"

# Modify the python executable:
# set a different value, so the file actually gets written
patchelf --debug --set-rpath "\$ORIGIN" $PYTHON
echo "TEMP RPATH: $(patchelf --print-rpath $PYTHON)"

# and revert it again
patchelf --debug --set-rpath "$rpath" $PYTHON
echo "RPATH: $(patchelf --print-rpath $PYTHON)"

$PYTHON --version
EOF
)

for image in "${IMAGES[@]}"; do
  echo "Running ${image}"
  docker run -i --rm "${image}" <<< "${SCRIPT}"
  echo $'\n\n\n'
done

Log output

$ ./patchelf-bug.sh
Running quay.io/pypa/manylinux2014_x86_64@sha256:005826a6fa94c97bd31fccf637a0f10621304da447ca2ab3963c13991dffa013
patchelf 0.16.1
Python 3.10.8
RPATH: 
patching ELF file '/opt/python/cp310-cp310/bin/python'
new rpath is '$ORIGIN'
rpath is too long, resizing...
DT_NULL index is 30
replacing section '.dynamic' with size 592
replacing section '.dynstr' with size 36674
this is an executable
using replaced section '.dynstr'
using replaced section '.dynamic'
last replaced is 20
looking at section '.interp'
replacing section '.interp' which is in the way
looking at section '.note.gnu.build-id'
replacing section '.note.gnu.build-id' which is in the way
looking at section '.note.ABI-tag'
replacing section '.note.ABI-tag' which is in the way
looking at section '.gnu.hash'
replacing section '.gnu.hash' which is in the way
looking at section '.dynsym'
replacing section '.dynsym' which is in the way
looking at section '.dynstr'
looking at section '.gnu.version'
first reserved offset/addr is 0x17fea/0x417fea
first page is 0x400000
needed space is 98952
needed space is 99008
needed pages is 1
clearing first 101586 bytes
rewriting section '.dynamic' from offset 0x2ee288 (size 576) to offset 0x318 (size 592)
rewriting section '.dynstr' from offset 0xf0b0 (size 36666) to offset 0x568 (size 36674)
rewriting section '.dynsym' from offset 0x3500 (size 48048) to offset 0x94b0 (size 48048)
rewriting section '.gnu.hash' from offset 0x308 (size 12792) to offset 0x15060 (size 12792)
rewriting section '.interp' from offset 0x2a8 (size 28) to offset 0x18258 (size 28)
rewriting section '.note.ABI-tag' from offset 0x2e8 (size 32) to offset 0x18278 (size 32)
rewriting section '.note.gnu.build-id' from offset 0x2c4 (size 36) to offset 0x18298 (size 36)
rewriting symbol table section 3
writing /opt/python/cp310-cp310/bin/python
TEMP RPATH: $ORIGIN
patching ELF file '/opt/python/cp310-cp310/bin/python'
new rpath is ''
writing /opt/python/cp310-cp310/bin/python
RPATH: 
Python 3.10.8

Running quay.io/pypa/manylinux2014_x86_64@sha256:383c6016156c94d7dbd10696c15f2444288b99a25927239b7b024e1cc6ca6a81
patchelf 0.17.0
Python 3.10.8
RPATH: 
patching ELF file '/opt/python/cp310-cp310/bin/python'
new rpath is '$ORIGIN'
rpath is too long, resizing...
DT_NULL index is 30
replacing section '.dynamic' with size 592
replacing section '.dynstr' with size 36674
this is an executable
using replaced section '.dynstr'
using replaced section '.dynamic'
last replaced is 20
looking at section '.interp'
replacing section '.interp' which is in the way
looking at section '.note.gnu.build-id'
replacing section '.note.gnu.build-id' which is in the way
looking at section '.note.ABI-tag'
replacing section '.note.ABI-tag' which is in the way
looking at section '.gnu.hash'
replacing section '.gnu.hash' which is in the way
looking at section '.dynsym'
replacing section '.dynsym' which is in the way
looking at section '.dynstr'
looking at section '.gnu.version'
first reserved offset/addr is 0x17fea/0x417fea
first page is 0x400000
needed space is 98952
needed space is 99008
needed pages is 1
clearing first 101586 bytes
rewriting section '.interp' from offset 0x2a8 (size 28) to offset 0x318 (size 28)
rewriting section '.note.gnu.build-id' from offset 0x2c4 (size 36) to offset 0x338 (size 36)
rewriting section '.note.ABI-tag' from offset 0x2e8 (size 32) to offset 0x360 (size 32)
rewriting section '.gnu.hash' from offset 0x308 (size 12792) to offset 0x380 (size 12792)
rewriting section '.dynsym' from offset 0x3500 (size 48048) to offset 0x3578 (size 48048)
rewriting section '.dynstr' from offset 0xf0b0 (size 36666) to offset 0xf128 (size 36674)
rewriting section '.dynamic' from offset 0x2ee288 (size 576) to offset 0x18070 (size 592)
rewriting symbol table section 5
writing /opt/python/cp310-cp310/bin/python
TEMP RPATH: $ORIGIN
patching ELF file '/opt/python/cp310-cp310/bin/python'
new rpath is ''
writing /opt/python/cp310-cp310/bin/python
RPATH: 
/bin/bash: line 18:    14 Segmentation fault      (core dumped) $PYTHON --version
otherjason commented 1 year ago

I ran into what I think is the same problem, also with a Python 3.10 executable. PR #447 seems to fix it for me.

pramodk commented 1 year ago

Hello All,

We are having the same issue with >=v0.17.0 (i.e. including latest release 0.17.2).

I build AppImages for my Python application, and in order to do that I'm using Python's official manylinux docker images where I copy one of the pre-built python environments and modify the RPATHs of all binaries and their dependencies using patchelf. The RPATHs get set to $ORIGIN (and other relative paths), so that when the AppImage's squashfs gets mounted on the user's system upon execution, Python can properly be run on unknown/arbitrary mount points. So far, this has all been working flawlessly.

We have the similar use case - as part of the NEURON project (simulator used in computational neuroscience community), we distribute python wheels. These wheels contain standalone binary files that are updated by patchelf (via auditwheel) for similar reasons mentioned above.

We are trying to update our wheels building pipeline with the latest quay.io/pypa/manylinux2014_x86_64 (which contains patchelf >= 0.17.0). With the newer patchelf, when one of the binary (modlunit in this case) segfaults when the RPATHs are updated. Also, ldd crashes:

[root@0998f73e5778]# ./modlunit
Segmentation fault (core dumped)

[root@0998f73e5778]# ldd ./modlunit
/usr/bin/ldd: line 116: 15156 Segmentation fault      (core dumped) LD_TRACE_LOADED_OBJECTS=1 LD_WARN= LD_BIND_NOW= LD_LIBRARY_VERSION=$verify_out LD_VERBOSE= "$@"

This issue doesn't appear if we are using older releases like 0.16.1. I also looked at the latest release 0.17.2 (containing #447) but that doesn't help in our case.

I locally build the patchelf and used git bisect to find the first "bad" commit. It point us to the 42394e880bc5524122234fe2c2eaa043063ac581 (#430). The previous commit 7c18779e852e102faebcfdf63ffd250dccdcf4a3 works fine! Also, with the latest master if I just revert #430 then the issue dissappears. I have no knowdlege of ELF/patchelf but just trying to provide additional information.

As a reproducer, you can run following script (thank you, @bastimeyer!):

#!/usr/bin/env bash

IMAGES=(
  # 2022-11-14 - patchelf 0.16.1 : using quay.io/pypa/manylinux2014_x86_64@sha256:005826a6fa94c97bd31fccf637a0f10621304da447ca2ab3963c13991dffa013
  neuronsimulator/reprod_patchelf_0160
  # 2022-11-19 - patchelf 0.17.0 using quay.io/pypa/manylinux2014_x86_64@sha256:383c6016156c94d7dbd10696c15f2444288b99a25927239b7b024e1cc6ca6a81
  neuronsimulator/reprod_patchelf_0170
)

SCRIPT=$(cat <<'EOF'
patchelf --version

patchelf --print-rpath /tmp/modlunit
patchelf --remove-rpath /tmp/modlunit
# just add some additional rpaths
patchelf --force-rpath --set-rpath \$ORIGIN/123456:\$ORIGIN/11_22_33_44_555 /tmp/modlunit
patchelf --print-rpath /tmp/modlunit
ldd /tmp/modlunit
EOF
)

for image in "${IMAGES[@]}"; do
  echo "Running ${image}"
  docker run -i --rm "${image}" <<< "${SCRIPT}"
  echo $'\n\n\n'
done

and it produces output as:

./run.sh
Running neuronsimulator/reprod_patchelf_0160
patchelf 0.16.1
$ORIGIN/../lib:/opt/nvidia/hpc_sdk/Linux_x86_64/22.11/compilers/lib:/opt/rh/devtoolset-11/root/usr/lib/gcc/x86_64-redhat-linux/11/../../../../lib64
$ORIGIN/123456:$ORIGIN/11_22_33_44_555
    linux-vdso.so.1 =>  (0x00007ffc0cad3000)
    libnvhpcatm.so => not found
    libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007fa75aa0a000)
    libnvomp.so => not found
    libdl.so.2 => /lib64/libdl.so.2 (0x00007fa75a806000)
    libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fa75a5ea000)
    libnvcpumath-avx2.so => not found
    libnvc.so => not found
    libc.so.6 => /lib64/libc.so.6 (0x00007fa75a21c000)
    libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007fa75a006000)
    libm.so.6 => /lib64/libm.so.6 (0x00007fa759d04000)
    /lib64/ld-linux-x86-64.so.2 (0x00007fa75ad12000)

Running neuronsimulator/reprod_patchelf_0170
patchelf 0.17.0
$ORIGIN/../lib:/opt/nvidia/hpc_sdk/Linux_x86_64/22.11/compilers/lib:/opt/rh/devtoolset-11/root/usr/lib/gcc/x86_64-redhat-linux/11/../../../../lib64
$ORIGIN/123456:$ORIGIN/11_22_33_44_555
/usr/bin/ldd: line 116:    16 Segmentation fault      (core dumped) LD_TRACE_LOADED_OBJECTS=1 LD_WARN= LD_BIND_NOW= LD_LIBRARY_VERSION=$verify_out LD_VERBOSE= "$@"

The docker images are created from a simple Dockerfile such as:

# 0.16.1
#FROM quay.io/pypa/manylinux2014_x86_64@sha256:005826a6fa94c97bd31fccf637a0f10621304da447ca2ab3963c13991dffa013

# 0.17.0
FROM quay.io/pypa/manylinux2014_x86_64@sha256:383c6016156c94d7dbd10696c15f2444288b99a25927239b7b024e1cc6ca6a81
COPY modlunit /tmp/

(i.e. on top of standard manylinux pypa image, I have included the modlunit binary that segfaults in our case).

I hope this will help to find the root cause. If anything else is needed to debug the issue, I will be more than happy to help!

Thank you!

brenoguim commented 1 year ago

Looking at the process header table of the program that crashes, we have the following two loads:

    2   0x3ff000 + 0x1118 RW align:0x1000
    7   0x400e22 + 0xf6e  R  align:0x1000

This is a bit weird because the start/end addresses do not respect alignment. And if you round down the start address and round up the end address, we get:

0x3ff000 -> 0x401000 RW
0x400000 -> 0x402000 R

Well, there is an overlap there with mixed access rights. If you run the program on GDB, it tells you two things: 1 - The crash happens at an attempt of the loader to write the position 0x400000 2 - Printing the /proc/pid/maps we get:

003ff000-00400000 rw-p
00400000-00402000 r--p

Which gives the answer as to what the kernel decided to do. It chose the safest access right for the clashing addresses.

This also explains why the commit https://github.com/NixOS/patchelf/commit/42394e880bc5524122234fe2c2eaa043063ac581 introduced the issue. After that patch, the .dynamic section is placed last in the segment, which is the portion that became read-only. Before that commit, .dynamic section is placed in the beginning, with a lot of RW space to use.

The original working binary had a unaligned segment entry already:

0x420cc0 + 0x5218 -> 0x425ed8 RW

but when rounded up and down, it didn't clash with any other segment.

So apparently Patchelf is doing it's thing by reordering/inserting/moving segments but in doing so, it's creating a segment clash.

brenoguim commented 1 year ago

With the PR mentioned above, ldd on the binary works correctly. But it needs discussion.

adonis0147 commented 1 year ago

I met the similar issue and I tried the latest version (0.18.0) but it didn't help. After I downgraded patchelf to 0.16.1, every things worked well.

Platform: Centos 6/Ubuntu 22.04 Architecture: x86_64

fda77 commented 1 year ago

Freetz downgraded to v15: https://github.com/Freetz-NG/freetz-ng/commit/eb0f8b6bfc0d9de5b16679f9c02b7867f25fc89e see https://github.com/Freetz-NG/freetz-ng/issues/740