JonathonReinhart / staticx

Create static executable from dynamic executable
https://staticx.readthedocs.io/
Other
324 stars 34 forks source link

Fix broken nssfix build on newer glibc #255

Closed JonathonReinhart closed 1 year ago

JonathonReinhart commented 1 year ago

Previously, libnssfix.so would unconditionally link against libnss_files and libnss_dns, as those are the only two libraries ("services" in NSS parlance) specified in our nsswitch.conf.

As of GLIBC 2.34, the files and dns NSS "modules" are built-in to libc (https://github.com/bminor/glibc/commit/6212bb67f4695962748a5981e1b9fea105af74f6 and https://github.com/bminor/glibc/commit/e1fcf21474c5b522fdad4ac0191d5dcc3271dba6). While libnss_files.so.2 and libnss_dns.so.2 still exist, they are empty stubs (bminor/glibc@9ed48feed8c268e98baf00f3608d85dafb8215f3), and there are no symlinks without .2, which causes a link failure for us.

This PR adds a configure step to determine which NSS modules are built-in, by simply trying to link against them. Modules which are built-in are excluded from the link when building libnssfix.so. Because SCons/scons#4373 silently causes a false negative, we add another basic conftest called BasicCheckLib().

Finally, we update the CI configuration to build under both Ubuntu 20.04 and 22.04. This required a creative update to the build job matrix, to exclude the right Python version from the deadsnakes/action setup.

Fixes #245

JonathonReinhart commented 1 year ago

It dawned on me that this probably doesn't really fix the problem. Yes, it fixes building staticx under new GLIBC>=2.34, but I think it breaks things in the opposite way when using that staticx (w/ libnssfix built against GLIBC>=2.34) on a system with older GLIBC < 2.34:

This is from a process I was stepping through (test/nss-isolated/build/app.sx) with GDB:

(gdb) where
#0  _nss_files_getpwuid_r (uid=1000, result=0x7ffff7fc2840 <resbuf>, buffer=0x55555555ba70 "", buflen=1024,
    errnop=0x7ffff7dee6c0) at nss_files/files-XXX.c:77
#1  0x00007ffff7eb7e23 in __getpwuid_r (uid=uid@entry=1000, resbuf=resbuf@entry=0x7ffff7fc2840 <resbuf>,
    buffer=0x55555555ba70 "", buflen=buflen@entry=1024, result=result@entry=0x7fffffffe200) at ../nss/getXXbyYY_r.c:315
#2  0x00007ffff7eb7513 in getpwuid (uid=1000) at ../nss/getXXbyYY.c:134
#3  0x0000555555555384 in test_passwd ()
#4  0x00005555555555fb in run_tests ()
#5  0x00005555555559b1 in main ()
$ awk '{ print $6 }' /proc/19393/maps | sort -u

[heap]
[stack]
/tmp/staticx-gLhGjp/app
/tmp/staticx-gLhGjp/ld-2.31.so
/tmp/staticx-gLhGjp/libc-2.31.so
/tmp/staticx-gLhGjp/libnssfix.so
/usr/lib/x86_64-linux-gnu/libnss_files-2.31.so
[vdso]
[vvar]
JonathonReinhart commented 1 year ago

I was wondering why none of my tests caught this. Particularly when testing locally (Debian 11, glibc 2.31) with staticx 0.14.0 preview.

The problem is that I didn't have TEST_DOCKER_IMAGE set, so the test that is designed to catch this did not run.

$ TEST_DOCKER_IMAGE=centos:5  ./run_test.sh
...
Running test: parent
Error: Failed to look up user for uid=0