clearlinux / distribution

Placeholder repository to allow filing of general bugs/issues/etc against the Clear Linux OS for Intel Architecture linux distribution
521 stars 29 forks source link

Last two versions of R (4.0.5 and 4.1.0) won't run under Docker #2361

Open scotstan opened 3 years ago

scotstan commented 3 years ago

I installed the last 5 clearlinux/r-base versions to bisect when the R command stopped working. Seems as if something changed between 4.0.2 and 4.0.5 that prevents R from running.

The last two versions (4.0.5 and 4.1.0) respond with ERROR: R_HOME ('/usr/lib64/R') not found

image

My host is Ubuntu 20.04.2.

Here's the short script to reproduce:

#!/usr/bin/env bash

# Clear Linux: R version tags from:
# https://hub.docker.com/r/clearlinux/r-base/tags?page=1&ordering=-name

define(){ IFS='\n' read -r -d '' ${1} || true; }

define CMD <<'EOF'
    Rscript --version
    Rscript -e "version[['nickname']]"
EOF

for ver in 3.6.3 4.0.0 4.0.2 4.0.5 4.1.0; do
    docker run --rm -it clearlinux/r-base:$ver bash -i -c "$CMD"
done

R 4.1.0 works fine with rocker/r-base

image

lebensterben commented 3 years ago

It works fine on native Clear Linux.

phmccarty commented 3 years ago

Thanks for the detailed report. I will investigate.

scotstan commented 3 years ago

I can test any time. Thanks @phmccarty

phmccarty commented 3 years ago

@scotstan I just ran your test script, and I cannot reproduce the issue. Maybe the snapshots of clearlinux/r-base:4.0.5 and clearlinux/r-base:4.1.0 that you tested were broken... That's possible, but I'm not yet sure what the underlying issue might have been. Can you test again after pulling the latest builds of those images?

Here's the output I see from the script on first run:

$ ./test.sh
Unable to find image 'clearlinux/r-base:3.6.3' locally
3.6.3: Pulling from clearlinux/r-base
3f3a3a3cd1fe: Pull complete
3b4719fcbf77: Pull complete
Digest: sha256:8f6fa391e33c6eafd88557be6402c07abe1a2114c22030f2fd77669c2a508f4f
Status: Downloaded newer image for clearlinux/r-base:3.6.3
R scripting front-end version 3.6.3 (2020-02-29)
[1] "Holding the Windsock"
Unable to find image 'clearlinux/r-base:4.0.0' locally
4.0.0: Pulling from clearlinux/r-base
a850526e45ae: Pull complete
ea3403d97db9: Pull complete
Digest: sha256:79d48f4a6efb29b1107276cb5650fe255d33f09ecb0a066f946d078c4d82683f
Status: Downloaded newer image for clearlinux/r-base:4.0.0
R scripting front-end version 4.0.0 (2020-04-24)
[1] "Arbor Day"
Unable to find image 'clearlinux/r-base:4.0.2' locally
4.0.2: Pulling from clearlinux/r-base
511d020684c4: Pull complete
92aa84514374: Pull complete
Digest: sha256:59439af4940e9f7b2b358ad54d494dd8e84c774005058a088dd550628cb1af82
Status: Downloaded newer image for clearlinux/r-base:4.0.2
R scripting front-end version 4.0.2 (2020-06-22)
[1] "Taking Off Again"
Unable to find image 'clearlinux/r-base:4.0.5' locally
4.0.5: Pulling from clearlinux/r-base
6d8a9c1757d0: Pull complete
3c576b2fb216: Pull complete
Digest: sha256:01a4ddb96e98e977cd02b11e12979d033cbdc1633d16eaf8d3ed6e838e21586f
Status: Downloaded newer image for clearlinux/r-base:4.0.5
R scripting front-end version 4.0.5 (2021-03-31)
[1] "Shake and Throw"
Unable to find image 'clearlinux/r-base:4.1.0' locally
4.1.0: Pulling from clearlinux/r-base
fa6f55ca1b2a: Pull complete
268dc30d08c7: Pull complete
Digest: sha256:33367ffe624b73cd1076605da71e0319fd2e366eec4f60f3701516ae84801eba
Status: Downloaded newer image for clearlinux/r-base:4.1.0
R scripting front-end version 4.1.0 (2021-05-18)
[1] "Camp Pontanezen"

And the second run:

 $ ./test.sh
R scripting front-end version 3.6.3 (2020-02-29)
[1] "Holding the Windsock"
R scripting front-end version 4.0.0 (2020-04-24)
[1] "Arbor Day"
R scripting front-end version 4.0.2 (2020-06-22)
[1] "Taking Off Again"
R scripting front-end version 4.0.5 (2021-03-31)
[1] "Shake and Throw"
R scripting front-end version 4.1.0 (2021-05-18)
[1] "Camp Pontanezen"
phmccarty commented 2 years ago

I re-tested this several times in the past few months, and I still could not reproduce. If it's still a problem on your end, please reopen.

scotstan commented 2 years ago

Thanks. I'll try to retest when I get a chance.

phmccarty commented 2 years ago

Great, thanks. And just to clarify, I only tested running the script on a Clear Linux host. But the underlying host OS should not matter in this case, I would think.

scotstan commented 2 years ago

Testing it now from Ubuntu 20.04.3 LTS with the 5.4 kernel. I don't think the host docker should matter, but I don't have easy access to a ClearLinux native.

Likely something broke or changed in the script at /usr/bin/R

Results are the same today (latest clearlinux/r-lang) image

scotstan commented 2 years ago

Good idea on cleaning out the old images. I did that so they would pull down fresh. However, still the same problem.

I think the solution is somewhere in the bash script /usr/bin/R that sets up paths or something. I'll keep poking around.

I get great performance with R using ClearLinux. That's why this was important. I'm not blocked, but leaving this here for others.

image

phmccarty commented 2 years ago

Thanks for testing. Reopening.

phmccarty commented 2 years ago

Revisiting this issue...

Neither the /usr/bin/R script nor the /usr/bin/Rscript binary's source file changed between R versions 4.0.2 and 4.0.5, so we can rule out obvious source changes affecting the behavior. I haven't yet spent much time analyzing the full source tree diff for any other clues.

Nothing stands out to me in the 4.0.3, 4.0.4, or 4.0.5 release notes as candidate breakage.

Looking the Clear Linux package history between 4.0.2 and 4.0.5, we modified config.site to change default variable assignments for AR, NM, RANLIB, and LTO, but I doubt those changes had an impact here, because nothing is being built. And I added a patch to fix the package build after we updated autoconf to 2.70, but again, this change is unlikely to blame.

I will proceed by testing this out on Ubuntu 20.04.3 and hopefully reproduce it :-)

scotstan commented 2 years ago

I have some time now to poke at it too. I think the answer is in the R script, in that it's using an environment variable or path no longer valid or something.

scotstan commented 2 years ago

As an FYI, R version 4.1.2 released since, so I added it to the simple test script, with the same results. Looking in to the delta from 4.0.2 to 4.0.5 where something changed...

image

scotstan commented 2 years ago

So strange! I found the line in /usr/bin/R (same script--sha's match--across v4.0.2 and v4.0.5). Line 19 has a simple bash test for executable set on /usr/lib64/R/bin/exec, which is sucessful on 4.0.2 but not on 4.0.5(!).

v4.0.2 on the left. v4.0.5 on the right.

Turns our v4.0.2 has bash --version 5.0.18, and v4.0.5 has bash version 5.1.16.

*Did something change with test -x from bash v5.0 to v5.1? Seems unlikely, but checking. image

image

scotstan commented 2 years ago

Might be related (from an open issue on bash 5.1 on Alpine).

https://unix.stackexchange.com/questions/671246/why-is-bash-not-evaluating-the-executable-bit-correctly-in-alpine-3-14-2

and

https://github.com/alpinelinux/docker-alpine/issues/156

scotstan commented 2 years ago

Workaround!

Delete these lines in /usr/bin/R that check executable bit for R binary:

image

With one command: sed -i 262,267d /usr/bin/R

image

bash 5.1

Not sure what's happening here, because none of the bash -x tests seem to work now. This will likely have big ramifications for bash scripts that test for validity of folder, execute bits, read bits, etc. Issue below has more details, but it's above my paygrade to attempt more fixes. For now, I'm good with a workaround.

https://github.com/alpinelinux/docker-alpine/issues/156

This should not fail

which bash
[-x /usr/bin/bash]
scotstan commented 2 years ago

The above just allows one to run R in interactive mode. I'm now trying to install some basic R packages (datatable, magrittr) and those now fail at /usr/lib64/R/bin/Rcmd: line 64: exec: INSTALL: not found. So the little hack above is not truly enough.

Workaround attempt: going back to last-known-stable version 4.0.2 that used bash 5.0.

phmccarty commented 2 years ago

Thanks for the detailed debug findings :-)

I still have not tested Ubuntu 20.04.3, but my suspicion is that the patch to include/seccomp-syscalls.h from https://github.com/seccomp/libseccomp/pull/322 (also see report https://github.com/seccomp/libseccomp/issues/314) should be backported. The current Ubuntu package version is 2.5.1-1ubuntu1~20.04.2, and I don't see any backports for this specific bug.

There are a bunch of interlinked packages involved here. We have docker, runc, libseccomp, and the running kernel on the Ubuntu host, and glibc plus whatever userspace programs are running within a container based on any of the clearlinux/r-base images.

The clearlinux/r-base:4.0.2 image contains glibc 2.31, but clearlinux/r-base:4.0.5 has glibc 2.33. Support for faccessat2 was added for glibc 2.33 (see https://github.com/bminor/glibc/commit/3d3ab573a5f3071992cbc4f57d50d1d2), so I think this is the primary reason you are seeing the image with 4.0.2 work correctly. Bash itself likely calls faccessat, resulting in glibc < 2.33 always wrapping that system call, and glibc >= 2.33 trying to use faccessat2 first, and falling back to faccessat if it's unsupported. I suspect the failure here is caused by the libseccomp bug linked above; glibc will see the wrong error code within the container environment and propagate it to bash, R, etc.

fenrus75 commented 2 years ago

is this the seccomp on the host side or inside the container ?

On Mon, Feb 7, 2022 at 12:44 PM Patrick McCarty @.***> wrote:

Thanks for the detailed debug findings :-)

I still have not tested Ubuntu 20.04.3, but my suspicion is that the patch to include/seccomp-syscalls.h from seccomp/libseccomp#322 https://github.com/seccomp/libseccomp/pull/322 (also see report seccomp/libseccomp#314 https://github.com/seccomp/libseccomp/issues/314) should be backported. The current Ubuntu package version is 2.5.1-1ubuntu1~20.04.2, and I don't see any backports for this specific bug.

There are a bunch of interlinked packages involved here. We have docker, runc, libseccomp, and the running kernel on the Ubuntu host, and glibc plus whatever userspace programs are running within a container based on any of the clearlinux/r-base images.

The clearlinux/r-base:4.0.2 image contains glibc 2.31, but clearlinux/r-base:4.0.5 has glibc 2.33. Support for faccessat2 was added for glibc 2.33 (see @.*** https://github.com/bminor/glibc/commit/3d3ab573a5f3071992cbc4f57d50d1d2), so I think this is the primary reason you are seeing the image with 4.0.2 work correctly. Bash itself likely calls faccessat, resulting in glibc < 2.33 always wrapping that system call, and glibc >= 2.33 trying to use faccessat2 first, and falling back to faccessat if it's unsupported. I suspect the failure here is caused by the libseccomp bug linked above; glibc will see the wrong error code within the container environment and propagate it to bash, R, etc.

— Reply to this email directly, view it on GitHub https://github.com/clearlinux/distribution/issues/2361#issuecomment-1031902573, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJ54FOMENLUAIUVJGXXB4LU2AVLJANCNFSM45KLQ2BQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you are subscribed to this thread.Message ID: @.***>

scotstan commented 2 years ago

Testing on macOS (Intel) just to throw another host at this. Working as expected on macOS but not Ubuntu!

image

thiagomacieira commented 2 years ago

The issue is your host Docker. It's applying a seccomp filter that blocks the faccessat2 system call but makes it return an errno code that isn't ENOSYS. If it were, glibc would fall back to an earlier system call which would likely be allowed.

This is a design flaw in libseccomp. It needs to use three categories, not two:

scotstan commented 2 years ago

Very interesting. Is there something in the host Docker configuration that controls this? I can also test on Debian Raspberry Pi. Maybe a few other hosts.

thiagomacieira commented 2 years ago

You can disable seccomp completely ("disable security"): --security-opt seccomp=unconfined

scotstan commented 2 years ago

Thanks: will try and report back.

Get Outlook for iOShttps://aka.ms/o0ukef


From: Thiago Macieira @.> Sent: Monday, February 7, 2022 4:02:33 PM To: clearlinux/distribution @.> Cc: Scott Stanfield @.>; Mention @.> Subject: Re: [clearlinux/distribution] Last two versions of R (4.0.5 and 4.1.0) won't run under Docker (#2361)

You can disable seccomp completely ("disable security").

— Reply to this email directly, view it on GitHubhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fclearlinux%2Fdistribution%2Fissues%2F2361%23issuecomment-1032065550&data=04%7C01%7CScott.Stanfield%40microsoft.com%7C91bb2cc91dbf43a0324208d9ea964eda%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637798753566851916%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=8RFuISsn7%2FParql8rvnBgJrjpXjn%2Bty1Cy8nimbHmIs%3D&reserved=0, or unsubscribehttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAN5273PFGS3GJYM6WYZHC6DU2BMRTANCNFSM45KLQ2BQ&data=04%7C01%7CScott.Stanfield%40microsoft.com%7C91bb2cc91dbf43a0324208d9ea964eda%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637798753566901906%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=gyc%2F33GWwi1NLjgp2qHRP8IdIcruURT5esjGetDPZ1k%3D&reserved=0. Triage notifications on the go with GitHub Mobile for iOShttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fapps.apple.com%2Fapp%2Fapple-store%2Fid1477376905%3Fct%3Dnotification-email%26mt%3D8%26pt%3D524675&data=04%7C01%7CScott.Stanfield%40microsoft.com%7C91bb2cc91dbf43a0324208d9ea964eda%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637798753566901906%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=m3c5In1QjodGASMMGL9VgbsmfQC5CNGrtsvw0PzJi4U%3D&reserved=0 or Androidhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fplay.google.com%2Fstore%2Fapps%2Fdetails%3Fid%3Dcom.github.android%26referrer%3Dutm_campaign%253Dnotification-email%2526utm_medium%253Demail%2526utm_source%253Dgithub&data=04%7C01%7CScott.Stanfield%40microsoft.com%7C91bb2cc91dbf43a0324208d9ea964eda%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637798753566901906%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=bTMvCp7QQoZIh%2B8KzZc0FE5Qopt946CNovSqP6nj0hg%3D&reserved=0. You are receiving this because you were mentioned.Message ID: @.***>