Open bastimeyer opened 1 year ago
I ran into what I think is the same problem, also with a Python 3.10 executable. PR #447 seems to fix it for me.
Hello All,
We are having the same issue with >=v0.17.0 (i.e. including latest release 0.17.2).
I build AppImages for my Python application, and in order to do that I'm using Python's official manylinux docker images where I copy one of the pre-built python environments and modify the RPATHs of all binaries and their dependencies using patchelf. The RPATHs get set to $ORIGIN (and other relative paths), so that when the AppImage's squashfs gets mounted on the user's system upon execution, Python can properly be run on unknown/arbitrary mount points. So far, this has all been working flawlessly.
We have the similar use case - as part of the NEURON project (simulator used in computational neuroscience community), we distribute python wheels. These wheels contain standalone binary files that are updated by patchelf (via auditwheel) for similar reasons mentioned above.
We are trying to update our wheels building pipeline with the latest quay.io/pypa/manylinux2014_x86_64
(which contains patchelf >= 0.17.0). With the newer patchelf, when one of the binary (modlunit
in this case) segfaults when the RPATHs are updated. Also, ldd
crashes:
[root@0998f73e5778]# ./modlunit
Segmentation fault (core dumped)
[root@0998f73e5778]# ldd ./modlunit
/usr/bin/ldd: line 116: 15156 Segmentation fault (core dumped) LD_TRACE_LOADED_OBJECTS=1 LD_WARN= LD_BIND_NOW= LD_LIBRARY_VERSION=$verify_out LD_VERBOSE= "$@"
This issue doesn't appear if we are using older releases like 0.16.1
. I also looked at the latest release 0.17.2
(containing #447) but that doesn't help in our case.
I locally build the patchelf and used git bisect
to find the first "bad" commit. It point us to the 42394e880bc5524122234fe2c2eaa043063ac581 (#430). The previous commit 7c18779e852e102faebcfdf63ffd250dccdcf4a3 works fine! Also, with the latest master if I just revert #430 then the issue dissappears. I have no knowdlege of ELF/patchelf but just trying to provide additional information.
As a reproducer, you can run following script (thank you, @bastimeyer!):
#!/usr/bin/env bash
IMAGES=(
# 2022-11-14 - patchelf 0.16.1 : using quay.io/pypa/manylinux2014_x86_64@sha256:005826a6fa94c97bd31fccf637a0f10621304da447ca2ab3963c13991dffa013
neuronsimulator/reprod_patchelf_0160
# 2022-11-19 - patchelf 0.17.0 using quay.io/pypa/manylinux2014_x86_64@sha256:383c6016156c94d7dbd10696c15f2444288b99a25927239b7b024e1cc6ca6a81
neuronsimulator/reprod_patchelf_0170
)
SCRIPT=$(cat <<'EOF'
patchelf --version
patchelf --print-rpath /tmp/modlunit
patchelf --remove-rpath /tmp/modlunit
# just add some additional rpaths
patchelf --force-rpath --set-rpath \$ORIGIN/123456:\$ORIGIN/11_22_33_44_555 /tmp/modlunit
patchelf --print-rpath /tmp/modlunit
ldd /tmp/modlunit
EOF
)
for image in "${IMAGES[@]}"; do
echo "Running ${image}"
docker run -i --rm "${image}" <<< "${SCRIPT}"
echo $'\n\n\n'
done
and it produces output as:
./run.sh
Running neuronsimulator/reprod_patchelf_0160
patchelf 0.16.1
$ORIGIN/../lib:/opt/nvidia/hpc_sdk/Linux_x86_64/22.11/compilers/lib:/opt/rh/devtoolset-11/root/usr/lib/gcc/x86_64-redhat-linux/11/../../../../lib64
$ORIGIN/123456:$ORIGIN/11_22_33_44_555
linux-vdso.so.1 => (0x00007ffc0cad3000)
libnvhpcatm.so => not found
libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007fa75aa0a000)
libnvomp.so => not found
libdl.so.2 => /lib64/libdl.so.2 (0x00007fa75a806000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fa75a5ea000)
libnvcpumath-avx2.so => not found
libnvc.so => not found
libc.so.6 => /lib64/libc.so.6 (0x00007fa75a21c000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007fa75a006000)
libm.so.6 => /lib64/libm.so.6 (0x00007fa759d04000)
/lib64/ld-linux-x86-64.so.2 (0x00007fa75ad12000)
Running neuronsimulator/reprod_patchelf_0170
patchelf 0.17.0
$ORIGIN/../lib:/opt/nvidia/hpc_sdk/Linux_x86_64/22.11/compilers/lib:/opt/rh/devtoolset-11/root/usr/lib/gcc/x86_64-redhat-linux/11/../../../../lib64
$ORIGIN/123456:$ORIGIN/11_22_33_44_555
/usr/bin/ldd: line 116: 16 Segmentation fault (core dumped) LD_TRACE_LOADED_OBJECTS=1 LD_WARN= LD_BIND_NOW= LD_LIBRARY_VERSION=$verify_out LD_VERBOSE= "$@"
The docker images are created from a simple Dockerfile such as:
# 0.16.1
#FROM quay.io/pypa/manylinux2014_x86_64@sha256:005826a6fa94c97bd31fccf637a0f10621304da447ca2ab3963c13991dffa013
# 0.17.0
FROM quay.io/pypa/manylinux2014_x86_64@sha256:383c6016156c94d7dbd10696c15f2444288b99a25927239b7b024e1cc6ca6a81
COPY modlunit /tmp/
(i.e. on top of standard manylinux pypa image, I have included the modlunit
binary that segfaults in our case).
I hope this will help to find the root cause. If anything else is needed to debug the issue, I will be more than happy to help!
Thank you!
Looking at the process header table of the program that crashes, we have the following two loads:
2 0x3ff000 + 0x1118 RW align:0x1000
7 0x400e22 + 0xf6e R align:0x1000
This is a bit weird because the start/end addresses do not respect alignment. And if you round down the start address and round up the end address, we get:
0x3ff000 -> 0x401000 RW
0x400000 -> 0x402000 R
Well, there is an overlap there with mixed access rights. If you run the program on GDB, it tells you two things: 1 - The crash happens at an attempt of the loader to write the position 0x400000 2 - Printing the /proc/pid/maps we get:
003ff000-00400000 rw-p
00400000-00402000 r--p
Which gives the answer as to what the kernel decided to do. It chose the safest access right for the clashing addresses.
This also explains why the commit https://github.com/NixOS/patchelf/commit/42394e880bc5524122234fe2c2eaa043063ac581 introduced the issue. After that patch, the .dynamic
section is placed last in the segment, which is the portion that became read-only.
Before that commit, .dynamic
section is placed in the beginning, with a lot of RW space to use.
The original working binary had a unaligned segment entry already:
0x420cc0 + 0x5218 -> 0x425ed8 RW
but when rounded up and down, it didn't clash with any other segment.
So apparently Patchelf is doing it's thing by reordering/inserting/moving segments but in doing so, it's creating a segment clash.
With the PR mentioned above, ldd
on the binary works correctly. But it needs discussion.
I met the similar issue and I tried the latest version (0.18.0) but it didn't help. After I downgraded patchelf to 0.16.1, every things worked well.
Platform: Centos 6/Ubuntu 22.04 Architecture: x86_64
Describe the bug
Just encountered an issue with patchelf 0.17.0...
I build AppImages for my Python application, and in order to do that I'm using Python's official manylinux docker images where I copy one of the pre-built python environments and modify the RPATHs of all binaries and their dependencies using patchelf. The RPATHs get set to
$ORIGIN
(and other relative paths), so that when the AppImage's squashfs gets mounted on the user's system upon execution, Python can properly be run on unknown/arbitrary mount points. So far, this has all been working flawlessly.The patchelf 0.17.0 upgrade however has introduced segmentation faults of modified executables after modifying their RPATH. Patchelf 0.16.1 was working fine.
When looking at the recent git commit history, there was a big change in 2cb863fb756d5005f7276ad7956e01079bffce46 in regards to the ELF header file with lots of changed constants. I have 0% knowledge of any internals here and how ELF files and dynamic linking works, but that's what stood out to me.
Steps To Reproduce
Here's a short BASH script for reproducing the issue in two manylinux docker containers. One with patchelf 0.16.1 and the next one right after patchelf was upgraded to 0.17.0. There are other changes included between those two image versions, but those are unrelated and the issue can also be reproduced by simply building patchelf 0.17.0 on the older image.
Log output