JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.82k stars 5.49k forks source link

ERROR: IOError: stat: permission denied (EACCES) on udocker #34918

Closed dr-br closed 4 years ago

dr-br commented 4 years ago

Intro+Relevance

udocker is a basic user tool to execute simple docker containers in user space without requiring root privileges.
It is the only means to deploy containerized jupyter+X on our super computers.
podman, docker and singularity can not be used (subuid/guid issues…) on the server nodes.

Problem

Current Julia versions (including nightly build) fail to run within a udocker-container. The container OSses testet are ubuntu (18.04) and centos:latest.
Older Julia versions like 1.0.5 and older perfectly work.
Tests with podman or docker are successful for all versions of Julia.

When I execute julia inside a udocker container I get the following error message (the same under ubuntu and centos):

./julia-1.3.1/bin/julia 
ERROR: IOError: stat: permission denied (EACCES) for file "/root/julia-1.3.1/bin/../etc/julia/startup.jl"
Stacktrace:
 [1] stat(::String) at ./stat.jl:69
 [2] isfile at ./stat.jl:311 [inlined]
 [3] load_julia_startup() at ./client.jl:314
 [4] exec_options(::Base.JLOptions) at ./client.jl:258
 [5] _start() at ./client.jl:460

Steps to reproduce

Install udocker

curl https://raw.githubusercontent.com/indigo-dc/udocker/devel/udocker.py > udocker
chmod u+rx ./udocker
./udocker install

Start ubuntu container

export PROOT_NO_SECCOMP=1
udocker pull ubuntu
udocker create --name=ubuntu ubuntu
udocker run --user=root --env="HOME=/root" --workdir="/root" ubuntu

Download and run Julia

Within the ubuntu container run:

apt update && apt install wget
wget https://julialang-s3.julialang.org/bin/linux/x64/1.3/julia-1.3.1-linux-x86_64.tar.gz
tar xvzf julia-1.3.1-linux-x86_64.tar.gz
./julia-1.3.1/bin/julia
DilumAluthge commented 4 years ago

However, it is strange that julia 1.2.0 and older work.

In Julia 1.2.0 and older, the version of libuv bundled with Julia did not use statx. They used stat and/or lstat instead.

Starting with Julia 1.3.0, the version of libuv bundled with Julia uses statx.

DilumAluthge commented 4 years ago

@dr-br So CentOS host and Ubuntu container is fine. But Ubuntu host and Ubuntu container has the problem.

I wonder: could you try Ubuntu host and CentOS container, and CentOS host and CentOS container?

I am curious: does this bug happen whenever the host Linux distro is the same as the container Linux distro? Or does it only happen for the very specific Ubuntu-Ubuntu combination?

dr-br commented 4 years ago

So I just built 1.5.0-DEV on our RHEL 7.7 Xeon Gold 6230 cluster inside an Ubuntu 18.04 udocker container. Works as expected, no errors. Also the binary packages (1.3.1, ...) work.

@DilumAluthge: I already tried Ubuntu host and CentOS container, it gave the same errors.

Keno commented 4 years ago

Might be a kernel bug in the Ubuntu host kernel

dr-br commented 4 years ago

@Keno: I would not dare to blame the Ubuntu host kernel. I mean, udocker does so many magic tricks. I would rather think of udocker not being well enough tested on "consumer-OSses".

Keno commented 4 years ago

Meh, we find kernel bugs about once or twice a month around these parts ;), but yes at this point this might be something to take up with the udocker developers.

DilumAluthge commented 4 years ago

It would be nice to have a MWE. Maybe a simple C program that makes both of the syscalls (stat versus statx).

DilumAluthge commented 4 years ago

@dr-br I notice that you have this in your workflow:

export PROOT_NO_SECCOMP=1

I wonder if that is relevant.

DilumAluthge commented 4 years ago

I also wonder if this issue is relevant: https://bugs.launchpad.net/ubuntu/+source/docker.io/+bug/1755250

dr-br commented 4 years ago

@dr-br I notice that you have this in your workflow:

export PROOT_NO_SECCOMP=1

I wonder if that is relevant.

Very likely. On the RHEL machines this is not necessary, only on the Ubuntu hosts.

DilumAluthge commented 4 years ago

So it seems that Ubuntu hosts are doing funny business with the statx syscall, as seen in this issue: https://bugs.launchpad.net/ubuntu/+source/docker.io/+bug/1755250

This issue may also be relevant: https://github.com/proot-me/proot/issues/106

Other potentially related discussions:

And even more potentially related discussions. Apparently the statx syscall inside containers (e.g. Docker containers) is a real pain.

DilumAluthge commented 4 years ago

What is the latest version of Ubuntu that you have access to?

dr-br commented 4 years ago

It seems to be statx inside container. I compiled 2 example programs, one uses stat, the other statx. On the host, they run fine, inside the container, only stat succeeds. Even on the RHLE system, statx fails

./statx-example . 
statx(.) = -1
.: Function not implemented

The Ubuntu versions I ran all the above tests are 19.10 and 20.04. They behave the same.

DilumAluthge commented 4 years ago

@Keno Is there any chance that in the Julia-specific fork of libuv, we could stop using statx?

Alternatively, we could ask upstream libuv to stop using statx, i.e. revert https://github.com/libuv/libuv/pull/2184

DilumAluthge commented 4 years ago

This is not unique to udocker. For example, this issue: https://github.com/docker/for-linux/issues/208

statx syscalls are only allowed in privileged containers

dr-br commented 4 years ago

Funny thing: I don't even get the statx example compiled on the RHEL host, as there is currently a 3.10 kernel running ;)

DilumAluthge commented 4 years ago

I think that https://github.com/JuliaLang/libuv/pull/7 will fix this.

dr-br commented 4 years ago

I think that JuliaLang/libuv#7 will fix this.

After a discussion with my colleague: Is it possible, that libuv already has a fallback, if statx ist not available? How else could julia run on a RHEL 7 host with 3.10 kernel? On an Ubuntu host, julia/libuv inside the container detects, that statx is available, but udocker does not support this?

DilumAluthge commented 4 years ago

Is it possible, that libuv already has a fallback, if statx ist not available?

If your kernel is old and does not have statx, then it will return ENOSYS. If libuv detects that statx returned the ENOSYS errno, it will fall back to stat. You can see this code here:

As you can see in the code, the fallback only applies when the errno returned by statx is the ENOSYS errno.

If seccomp blocks your call to statx, then it will return some errno. The specific value of that errno is user-defined. If the errno is the ENOSYS errono, then libuv will fall back to stat, as I described above. However, if the errno is not the ENOSYS errno, then libuv will return the errno, i.e. there is no fallback to stat in that case.

Keno commented 4 years ago

The errno returned by seccomp is user defined. It not being enosys or something sensible is a bug in udocker or one of its dependencies.

DilumAluthge commented 4 years ago

The errno returned by seccomp is user defined. It not being enosys or something sensible is a bug in udocker or one of its dependencies.

I’ve corrected my answer.

DilumAluthge commented 4 years ago

It seems to be statx inside container. I compiled 2 example programs, one uses stat, the other statx. On the host, they run fine, inside the container, only stat succeeds. Even on the RHLE system, statx fails

./statx-example . 
statx(.) = -1
.: Function not implemented

The Ubuntu versions I ran all the above tests are 19.10 and 20.04. They behave the same.

@dr-br Can you post an issue on the udocker repo (https://github.com/indigo-dc/udocker) and include the code for your example programs? And cc me in the issue? Hopefully that will help us get things moving on the udocker end.

Keno commented 4 years ago

Since this turned out not to be a julia issue, I'm gonna go ahead and close this. Discussion can continue here of course.