Closed dr-br closed 4 years ago
However, it is strange that julia 1.2.0 and older work.
In Julia 1.2.0 and older, the version of libuv bundled with Julia did not use statx
. They used stat
and/or lstat
instead.
Starting with Julia 1.3.0, the version of libuv bundled with Julia uses statx
.
@dr-br So CentOS host and Ubuntu container is fine. But Ubuntu host and Ubuntu container has the problem.
I wonder: could you try Ubuntu host and CentOS container, and CentOS host and CentOS container?
I am curious: does this bug happen whenever the host Linux distro is the same as the container Linux distro? Or does it only happen for the very specific Ubuntu-Ubuntu combination?
So I just built 1.5.0-DEV on our RHEL 7.7 Xeon Gold 6230 cluster inside an Ubuntu 18.04 udocker container. Works as expected, no errors. Also the binary packages (1.3.1, ...) work.
@DilumAluthge: I already tried Ubuntu host and CentOS container, it gave the same errors.
Might be a kernel bug in the Ubuntu host kernel
@Keno: I would not dare to blame the Ubuntu host kernel. I mean, udocker does so many magic tricks. I would rather think of udocker not being well enough tested on "consumer-OSses".
Meh, we find kernel bugs about once or twice a month around these parts ;), but yes at this point this might be something to take up with the udocker developers.
It would be nice to have a MWE. Maybe a simple C program that makes both of the syscalls (stat
versus statx
).
@dr-br I notice that you have this in your workflow:
export PROOT_NO_SECCOMP=1
I wonder if that is relevant.
I also wonder if this issue is relevant: https://bugs.launchpad.net/ubuntu/+source/docker.io/+bug/1755250
@dr-br I notice that you have this in your workflow:
export PROOT_NO_SECCOMP=1
I wonder if that is relevant.
Very likely. On the RHEL machines this is not necessary, only on the Ubuntu hosts.
So it seems that Ubuntu hosts are doing funny business with the statx
syscall, as seen in this issue: https://bugs.launchpad.net/ubuntu/+source/docker.io/+bug/1755250
This issue may also be relevant: https://github.com/proot-me/proot/issues/106
Other potentially related discussions:
And even more potentially related discussions. Apparently the statx
syscall inside containers (e.g. Docker containers) is a real pain.
What is the latest version of Ubuntu that you have access to?
It seems to be statx inside container. I compiled 2 example programs, one uses stat, the other statx. On the host, they run fine, inside the container, only stat succeeds. Even on the RHLE system, statx fails
./statx-example .
statx(.) = -1
.: Function not implemented
The Ubuntu versions I ran all the above tests are 19.10 and 20.04. They behave the same.
@Keno Is there any chance that in the Julia-specific fork of libuv, we could stop using statx
?
Alternatively, we could ask upstream libuv to stop using statx
, i.e. revert https://github.com/libuv/libuv/pull/2184
This is not unique to udocker. For example, this issue: https://github.com/docker/for-linux/issues/208
statx syscalls are only allowed in privileged containers
Funny thing: I don't even get the statx example compiled on the RHEL host, as there is currently a 3.10 kernel running ;)
I think that https://github.com/JuliaLang/libuv/pull/7 will fix this.
I think that JuliaLang/libuv#7 will fix this.
After a discussion with my colleague: Is it possible, that libuv already has a fallback, if statx ist not available? How else could julia run on a RHEL 7 host with 3.10 kernel? On an Ubuntu host, julia/libuv inside the container detects, that statx is available, but udocker does not support this?
Is it possible, that libuv already has a fallback, if statx ist not available?
If your kernel is old and does not have statx
, then it will return ENOSYS
. If libuv detects that statx
returned the ENOSYS
errno, it will fall back to stat
. You can see this code here:
As you can see in the code, the fallback only applies when the errno returned by statx
is the ENOSYS
errno.
If seccomp
blocks your call to statx
, then it will return some errno. The specific value of that errno is user-defined. If the errno is the ENOSYS
errono, then libuv will fall back to stat
, as I described above. However, if the errno is not the ENOSYS
errno, then libuv will return the errno, i.e. there is no fallback to stat
in that case.
The errno returned by seccomp is user defined. It not being enosys or something sensible is a bug in udocker or one of its dependencies.
The errno returned by seccomp is user defined. It not being enosys or something sensible is a bug in udocker or one of its dependencies.
I’ve corrected my answer.
It seems to be statx inside container. I compiled 2 example programs, one uses stat, the other statx. On the host, they run fine, inside the container, only stat succeeds. Even on the RHLE system, statx fails
./statx-example . statx(.) = -1 .: Function not implemented
The Ubuntu versions I ran all the above tests are 19.10 and 20.04. They behave the same.
@dr-br Can you post an issue on the udocker repo (https://github.com/indigo-dc/udocker) and include the code for your example programs? And cc me in the issue? Hopefully that will help us get things moving on the udocker end.
Since this turned out not to be a julia issue, I'm gonna go ahead and close this. Discussion can continue here of course.
Intro+Relevance
udocker is a basic user tool to execute simple docker containers in user space without requiring root privileges.
It is the only means to deploy containerized jupyter+X on our super computers.
podman, docker and singularity can not be used (subuid/guid issues…) on the server nodes.
Problem
Current Julia versions (including nightly build) fail to run within a udocker-container. The container OSses testet are ubuntu (18.04) and centos:latest.
Older Julia versions like 1.0.5 and older perfectly work.
Tests with podman or docker are successful for all versions of Julia.
When I execute julia inside a udocker container I get the following error message (the same under ubuntu and centos):
Steps to reproduce
Install udocker
Start ubuntu container
Download and run Julia
Within the ubuntu container run: