NixOS / patchelf

A small utility to modify the dynamic linker and RPATH of ELF executables
GNU General Public License v3.0
3.57k stars 486 forks source link

Test regression in 0.18.0: repeated-updates.sh, replace-add-needed.sh #515

Open matoro opened 1 year ago

matoro commented 1 year ago

Between 0.17.2 and 0.18.0, we have one test regression and one new test that does not work on sparc64. I bisected this to https://github.com/NixOS/patchelf/pull/475 and confirmed that applying a revert on top of 0.18.0 successfully passes all tests - both the existing replace-add-needed.sh and the new repeated-updates.sh.

As part of investigating this, I looked into https://github.com/NixOS/patchelf/issues/492 and https://github.com/NixOS/patchelf/pull/510. Applying the latter PR causes replace-add-needed.sh to pass, but repeated-updates.sh to get stuck in some sort of I/O loop, pegging the disk.

I now have a proper onboarding flow for my exotic architecture testing environment. Details are available here - if you want access, just mention and I can provide free shell access on real hardware. @Mic92 still has access, but username has been changed to g-mic92.

Downstream Gentoo bug

chewi commented 1 year ago

I should probably double check it tomorrow when I'm not half asleep, but I wondered whether reverting #475 would also help fix the tests for m68k, and it does! In that case, it was just replace-add-needed.sh that was failing though.

yuta-hayama commented 1 year ago

As part of investigating this, I looked into #492 and #510. Applying the latter PR causes replace-add-needed.sh to pass, but repeated-updates.sh to get stuck in some sort of I/O loop, pegging the disk.

Certainly, applying #510 and running repeated-updates.sh, the script should appear to be stuck with a massive amount of disk I/O. But this is not completely stuck in an infinite loop. If you have enough disk space and wait a dozen minutes or so, the test should PASS.

Please let me know the result of the following command. How many is the p_align of the LOAD segment?

readelf -l tests/scratch/repeated-updates/libbar.so

As noted in the rewriteSectionsLibrary() comment, patchelf places the updated section at "the end of the file". https://github.com/NixOS/patchelf/blob/0.18.0/src/patchelf.cc#L797

Where "the end of the file" means a position after any section that existed before patchelf was applied. In repeated-updates.sh, only the .dynstr section is rewritten, but even though the rewriting deletes the old .dynstr section, the new .dynstr section is placed after the old .dynstr section. As a result, each time patchelf is applied, a "gap" appears where no section is placed, increasing the output file size.

If #510 is not applied, patchelf aligns the in-file offset of the new section (and the new LOAD segment) with pagesize (0x1000). Thus, if the updated section is short, each time patchelf is run, the size of the output file will increased by 0x1000.

However, as explained in #510, glibc before 2.35 has a bug that requires aligning in-file offset with p_align to work properly. When #510 is applied, each time patchelf is run, the size of the output file is increased by p_align.

Since repeated-updates.sh executes patchelf about 200 times, for example, if the p_align of the LOAD segment is 0x200000, the size of libbar.so when the test is completed will be about 400 MB. This is the reason for the huge amount of disk I/O when repeated-updates.sh is run.

To prevent the file from becoming huge, it may be possible to take measures such as placing the new section where the old (overwritten) section was when the section is rewritten. But to do so, we will have to overturn the premise of rewriteSectionsLibrary() (... For dynamic libraries, we just place the replacement sections at the end of the file.) This is expected to be a complex task and I have not started it yet.

matoro commented 1 year ago

Since this seems to affect multiple architectures, I ran this on some of my test machines.

Affected: m68k (per Chewi above), alpha, sparc64, sparc 32-bit

Unaffected: arm64, arm 32-bit, ppc64le, ppc64, ppc 32-bit, ia64, mips64, mips 32-bit, riscv64

chewi commented 1 year ago

In m68k's case, you have to align to 16-bit, although we are strongly considering building the whole system with -malign-int in future to align to 32-bit instead, which fixes various things. I thought I'd tried patchelf with this already, but I'm possibly misremembering.

paulgevers commented 8 months ago

Debian has applied the following very simple patch to solve/work around the issue on mips64el:

--- a/tests/repeated-updates.sh
+++ b/tests/repeated-updates.sh
@@ -33,7 +33,7 @@
 # To be even more strict, check that we don't add too many extra LOAD entries
 ###############################################################################
 echo "Segments before: ${load_segments_before} and after: ${load_segments_after}"
-if [ "${load_segments_after}" -gt $((load_segments_before + 2)) ]
+if [ "${load_segments_after}" -gt $((load_segments_before + 3)) ]
 then
     exit 1
 fi