JonathonReinhart / staticx

Create static executable from dynamic executable
https://staticx.readthedocs.io/
Other
345 stars 37 forks source link

/tmp/libnssfix_XXX.so require execution permission. #203

Closed sansna closed 3 years ago

sansna commented 3 years ago

OS: centos 7 python 3.9.2 after successfully installed staticx 0.13.2, running

 staticx bin bin.static

generates an error: ldd: warning: you do not have execution permission for `/tmp/libnssfix-jtdtzi6p.so'

the static binary generated after the above warning, but running it causes a segmentation fault.

Detailed log around that error:

DEBUG:root:libcrypto.so.1.1 already in pyinstaller archive DEBUG:root:Running ['ldd', '/tmp/staticx-pyi-xstkb5xq/libstdc++.so.6'] DEBUG:root:Ignoring synthetic library: linux-vdso.so.1 DEBUG:root:libgcc_s.so.1 already in pyinstaller archive DEBUG:root:Running ['ldd', '/tmp/staticx-pyi-xstkb5xq/libtinfow.so.6'] DEBUG:root:Ignoring synthetic library: linux-vdso.so.1 DEBUG:root:Running ['ldd', '/tmp/staticx-pyi-xstkb5xq/libz.so.1'] DEBUG:root:Ignoring synthetic library: linux-vdso.so.1 DEBUG:root:Program linked with GLIBC: Found libc.so.6 GLIBC_2.7 DEBUG:root:Running ['patchelf', '--add-needed', 'libnssfix.so', '/tmp/staticx-prog-4i5zi67h'] INFO:root:Adding libnssfix.so DEBUG:root:Running ['ldd', '/tmp/libnssfix-jtdtzi6p.so'] ldd: warning: you do not have execution permission for `/tmp/libnssfix-jtdtzi6p.so' DEBUG:root:Ignoring synthetic library: linux-vdso.so.1 INFO:root:Processing library libnss_dns.so.2 (/lib64/libnss_dns.so.2) INFO:root:Adding Symlink libnss_dns.so.2 => libnss_dns-2.17.so INFO:root:Adding /lib64/libnss_dns-2.17.so as libnss_dns-2.17.so

sansna commented 3 years ago

related pr

JonathonReinhart commented 3 years ago

The message from ldd was only a warning, not an error, and it was harmless. However, I fixed it in #204, and I released v0.13.3 with that fix. Thanks for pointing it out.

sansna commented 3 years ago

I am confused, then why the segmentation fault running the static binary.

The message from ldd was only a warning, not an error, and it was harmless. However, I fixed it in #204, and I released v0.13.3 with that fix. Thanks for pointing it out.

JonathonReinhart commented 3 years ago

the static binary generated after the above warning, but running it causes a segmentation fault.

This segfault has nothing to do with the warning from ldd. That warning was a red herring. (Didn't you still get a segfault after you implemented #200?)

I suspect you are hitting the same problem as #198, which I suspect is caused by patchelf.

CentOS 7 does not include patchelf in the standard repository, so I will assume that you (like most) also are using EPEL. EPEL has patchelf 0.12-1.el7. Can you confirm that is the patchelf executable that staticx is using? If so, please add a comment on #198, and let's track your segfault issue there, thanks.

sansna commented 3 years ago

the static binary generated after the above warning, but running it causes a segmentation fault.

This segfault has nothing to do with the warning from ldd. That warning was a red herring. (Didn't you still get a segfault after you implemented #200?)

I suspect you are hitting the same problem as #198, which I suspect is caused by patchelf.

CentOS 7 does not include patchelf in the standard repository, so I will assume that you (like most) also are using EPEL. EPEL has patchelf 0.12-1.el7. Can you confirm that is the patchelf executable that staticx is using? If so, please add a comment on #198, and let's track your segfault issue there, thanks.

patchelf 0.11.20200609.d6b2a72

note after fixed this warning, no more segmentation fault again.

JonathonReinhart commented 3 years ago

note after fixed this warning, no more segmentation fault again.

Sorry, but I do not see any way that #204 actually fixed your segfault issues.

In the version of ldd that Debian ships, they've actually removed this stupid warning: local-ldd.diff

And, staticx already marks the file executable when it adds it to the tar archive: https://github.com/JonathonReinhart/staticx/blob/v0.13.2/staticx/archive.py#L98


You should be able to prove your case very easily by running these commands and showing the output here:

# Install old version of staticx
virtualenv -p python3 venv-sx132
source venv-sx132/bin/activate
pip install staticx==0.13.2
staticx --debug $(which date) date.sx132
./date.sx132
deactivate

virtualenv -p python3 venv-sx133
source venv-sx133/bin/activate
pip install staticx==0.13.3
staticx --debug $(which date) date.sx133
./date.sx133
deactivate
sansna commented 3 years ago

I am not gonna prove it my own. you do the prove job. I am not happy because you rejected my PR even do not discuss it with me. So it is not my duty now.

JonathonReinhart commented 3 years ago

First things first:

I have absolutely no "duty" to you whatsoever. This is an open-source project that I maintain in my "spare" time and give away freely to the world because I want to. Not because I have to.

Because it is mine, I am free to do whatever I want with it! Including rejecting your PR because it had multiple problems. I don't have to discuss it with you. I didn't like how you wrote it, and I wanted to fix it my way. (The way that I already fixed it in another place.)

If you want your PRs to be accepted by open-source maintainers, you should seek to appease them by submitting quality code that conforms to the project's standards. I tried to be kind and provide feedback for you to improve your Python and Git skills, but I now see that was a waste of my energy and time.

If you don't like these terms, then the solution is simple: Stop using staticx.

Second:

I already did prove (to myself -- you know, the one who owns the project) that your analysis is incorrect. I am getting segfaults in CentOS 7 with and without the benign fix in 0.13.3.

I am suggesting that you prove it to yourself because I already know there's a problem (#198) that has nothing to do with #204, and frankly, your input probably adds nothing. I even spoon-fed you the exact commands to run!

Next:

I am actually currently working on #198 and narrowing in on the problem. Guess what? It looks like patchelf is corrupting the ELF headers -- which, as I indicated, has absolutely nothing to do with this stupid ldd warning.

But instead of working on that issue, I'm spending my time arguing with an ignorant, ungrateful user.

If you don't want to help staticx, then please move along.

sansna commented 3 years ago

First things first:

I have absolutely no "duty" to you whatsoever. This is an open-source project that I maintain in my "spare" time and give away freely to the world because I want to. Not because I have to.

Because it is mine, I am free to do whatever I want with it! Including rejecting your PR because it had multiple problems. I don't have to discuss it with you. I didn't like how you wrote it, and I wanted to fix it my way. (The way that I already fixed it in another place.)

If you want your PRs to be accepted by open-source maintainers, you should seek to appease them by submitting quality code that conforms to the project's standards. I tried to be kind and provide feedback for you to improve your Python and Git skills, but I now see that was a waste of my energy and time.

If you don't like these terms, then the solution is simple: Stop using staticx.

Second:

I already did prove (to myself -- you know, the one who owns the project) that your analysis is incorrect. I am getting segfaults in CentOS 7 with and without the benign fix in 0.13.3.

I am suggesting that you prove it to yourself because I already know there's a problem (#198) that has nothing to do with #204, and frankly, your input probably adds nothing. I even spoon-fed you the exact commands to run!

Next:

I am actually currently working on #198 and narrowing in on the problem. Guess what? It looks like patchelf is corrupting the ELF headers -- which, as I indicated, has absolutely nothing to do with this stupid ldd warning.

But instead of working on that issue, I'm spending my time arguing with an ignorant, ungrateful user.

If you don't want to help staticx, then please move along.

Your way of fixing it does result in a segment fault. Mine does not.

[root@334e028e8654 /]# staticx $(echo "/bin/ls") ls.sx [root@334e028e8654 /]# ./ls.sx anaconda-post.log bin dev etc home lib lib64 ls.sx media mnt opt proc root run sbin srv sys tmp usr var [root@334e028e8654 /]# staticx -V staticx 0.13.3

this is my modified version of 0.13.3. And here is your published version of 0.13.3:

[root@334e028e8654 /]# staticx $(echo "/bin/ls") ls.sx [root@334e028e8654 /]# ./ls.sx Segmentation fault [root@334e028e8654 /]# staticx -v usage: staticx [-h] [-l LIBS] [--strip] [--no-compress] [-V] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL}] prog output staticx: error: the following arguments are required: prog, output [root@334e028e8654 /]# staticx -V staticx 0.13.3

Another thing to say, I am helping because it's public and it is open for PR, if it is private then nobody helps you. If that is your altitude, please keep it private, or close your PR page.

I think we are here, together to keep the project working, right?

sansna commented 3 years ago

you can start a docker centos:7 to verify what i said.

sansna commented 3 years ago

Note: I am not saying that my modification is absolutely correct, but we can discuss to find the better way to solve it.

JonathonReinhart commented 3 years ago

This is probably due to language differences, but I misunderstood your comment here -- I thought you meant that 0.13.3 (#204) did fix your segfault. I now understand that you're saying #204 did not fix your segfault, but that your PR #200 did fix it.

First, showing that 0.13.3 does not fix the segfault (which we both agree):

$ docker run --rm -it centos:7-python3
[root@ed21ba403167 /]# pip3 install staticx==0.13.3
...
[root@ed21ba403167 /]# staticx --version
staticx 0.13.3
[root@ed21ba403167 /]# staticx $(which date) date.sx
[root@ed21ba403167 /]# ./date.sx 
Segmentation fault

Now, testing your PR #200 (which I am speculating will not fix the issue):

[root@d7ff59649c12 /]# yum install gcc glibc-static
...
[root@d7ff59649c12 /]# pip install scons
...
[root@d7ff59649c12 /]# pip install https://github.com/JonathonReinhart/staticx/archive/refs/heads/sansna-fix-libnssfix-permission.tar.gz
...
[root@d7ff59649c12 /]# staticx --version
staticx 0.13.1+g1f772ea
[root@d7ff59649c12 /]# staticx $(which date) date.sx
[root@d7ff59649c12 /]# ./date.sx 
Thu Oct 14 06:32:11 UTC 2021

I am extremely surprised by this outcome, and do not yet understand it.

I apologize for dismissing this. I still think there is something else at play here. I will continue to investigate #198 and compare it to this finding.

JonathonReinhart commented 3 years ago

Okay, I figured it out. As I indicated originally, #200, #203, and #204 have absolutely nothing to do with the segfault issue.

Well, it seems that some versions of objcopy (from binutils) malfunction when given a binary linked with musl-libc -- which is where I almost got to with https://github.com/JonathonReinhart/staticx/issues/198#issuecomment-942999650.

I proved this by transplanting the musl-libc bootloader into the docker container from the previous test where your #200 was installed:

$ docker cp bootloader-musl d7ff:/usr/local/lib/python3.6/site-packages/staticx/assets/release/bootloader

And now, even with your #200 in place, it still segfaults:

[root@d7ff59649c12 /]# grep chmod /usr/local/lib/python3.6/site-packages/staticx/hooks/glibc.py 
        os.chmod(nssfix.name, 0o755&~umask)
[root@d7ff59649c12 /]# staticx $(which date) date.sx
[root@d7ff59649c12 /]# ./date.sx 
Segmentation fault
sansna commented 3 years ago

Okay, I figured it out. As I indicated originally, #200, #203, and #204 have absolutely nothing to do with the segfault issue.

  • When you installed staticx 0.13.2 or 0.13.3 via pip install (like I did above) -- you downloaded the Wheel (.whl ) which includes a pre-built bootloader, which was statically linked using musl libc.
  • When you tested your "fix" @Wentao[master]: fix CentOS 7 execution permission over libnssfix_XXXX.so #200 (and when I tested it above) -- you built the bootloader locally, which statically linked against glibc (not musl) -- note that I had to install glibc-static above

Well, it seems that some versions of objcopy (from binutils) malfunction when given a binary linked with musl-libc -- which is where I almost got to with #198 (comment).

I proved this by transplanting the musl-libc bootloader into the docker container from the previous test where your #200 was installed:

$ docker cp bootloader-musl d7ff:/usr/local/lib/python3.6/site-packages/staticx/assets/release/bootloader

And now, even with your #200 in place, it still segfaults:

[root@d7ff59649c12 /]# grep chmod /usr/local/lib/python3.6/site-packages/staticx/hooks/glibc.py 
        os.chmod(nssfix.name, 0o755&~umask)
[root@d7ff59649c12 /]# staticx $(which date) date.sx
[root@d7ff59649c12 /]# ./date.sx 
Segmentation fault

I cannot following your statement, but to contribute to your final solving the problem, i installed both version of staticx from pip. One from pypi.org, one by self hosted pypi repo, where i pushed my modified release to.

JonathonReinhart commented 3 years ago

It doesn't matter that it's coming from PyPI. The problem is this:

You aren't seeing this when you are developing locally, because you aren't using musl libc. So therefore, you assume that #200 made an impact on the segfault when it did not.

The problem is that the bootloader is getting corrupted. It makes absolutely zero sense that libnssfix.so being executable would have any impact on that.

Watch what happens when I install staticx 0.13.3 without using the wheel:

$ docker run --rm -it centos:7-python3 
[root@7c7f4cdf3120 /]# yum install gcc glibc-static
...
[root@7c7f4cdf3120 /]# pip install scons
...
[root@7c7f4cdf3120 /]# pip install --no-binary=:all: staticx==0.13.3            
... (note this will take much longer because you have to compile the bootloader) ...
[root@7c7f4cdf3120 /]# staticx --version
staticx 0.13.3
[root@7c7f4cdf3120 /]# staticx $(which date) date.sx
[root@7c7f4cdf3120 /]# ./date.sx 
Thu Oct 14 07:11:53 UTC 2021

See, it worked without your changes.