apptainer / singularity

Singularity has been renamed to Apptainer as part of us moving the project to the Linux Foundation. This repo has been persisted as a snapshot right before the changes.
https://github.com/apptainer/apptainer
Other
2.51k stars 426 forks source link

Support for autofs? #347

Closed oschulz closed 7 years ago

oschulz commented 7 years ago

I'm having trouble accessing auto-mounted directories within a Singularity container (we're using auto-mounting for several storage systems). The mounts are supposed to show up under /remote, and I bind-mount `/remote into the container.

If the directory in question (e.g. /remote/my-auto-mount) is already mounted on the host, all is fine. However, if it is not mounted on the host yet, and accessed in the container, I get this error:

$ ls /remote/my-auto-mount
ls: cannot open directory '/remote/my-auto-mount': Too many levels of symbolic links

However, this does trigger auto-mounting of the directory on the host. The mount doesn't become available in the running container (a second ls fails, too, with the same error) - but when I exit the container and enter it again, the mount is there and works.

This is with Singularity v2.2 on Linux kernel 3.13, FS overlay is enabled.

Is it possible, in principle, to support auto-mounts in Singularity? Or maybe I just need a newer kernel?

bbockelm commented 7 years ago

What OS is this?

I see the same behavior for RHEL6 (kernel 2.6.32) but not RHEL7 (3.10). I wonder if the fix is in the patches that RedHat backports?

Unfortunately, this is a kernel bug and I haven't figured out a workaround on older platforms.

Brian

Sent from my iPhone

On Nov 27, 2016, at 3:05 AM, Oliver Schulz notifications@github.com wrote:

I'm having trouble accessing auto-mounted directories within a Singularity container (we're using auto-mounting for several storage systems). The mounts are supposed to show up under /remote, and I bind-mount `/remote into the container.

If the directory in question (e.g. /remote/my-auto-mount) is already mounted on the host, all is fine. However, if it is not mounted on the host yet, and accessed in the container, I get this error:

$ ls /remote/my-auto-mount ls: cannot open directory '/remote/my-auto-mount': Too many levels of symbolic links However, this does trigger auto-mounting of the directory on the host. The mount doesn't become available in the running container (a second ls fails, too, with the same error) - but when I exit the container and enter it again, the mount is there and works.

This is with Singularity v2.2 on Linux kernel 3.13, FS overlay is enabled.

Is it possible, in principle, to support auto-mounts in Singularity? Or maybe I just need a newer kernel?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

oschulz commented 7 years ago

@bbockelm Thanks for the info! This was on Ubuntu 14.04 (trusty) with Kernel 3.13.0-98-generic. I've upgraded the system to LTE kernel 4.4.0-47-generic now (still Ubuntu trusty), but the problem persists.

I wonder why it works on RHEL7 with kernel 3.10, but not on Ubuntu trusty with kernel 4.4? Does RedHat backport some patch that's not even in fairly recent kernels, here? If so, I wonder if Ubuntu could be convinced to include that patch as well?

bbockelm commented 7 years ago

That's a great question! I haven't tried to bisect this issue (kernel development is not my forte...).

In the autofs setup I use, the path inside and outside the container are the same. Is that true for you too? If not, that might be another source of potential difference...

oschulz commented 7 years ago

In the autofs setup I use, the path inside and outside the container are the same. Is that true for you too?

Yes, I use the same path in the container as on the host.

bauerm97 commented 7 years ago

I can't provide a solution, but I can say that running on debian 8.5 with a slightly modified kernel we are able to bind mount an autofs mounted CVMFS directory successfully into containers.

oschulz commented 7 years ago

debian 8.5 with a slightly modified kernel we are able to bind mount an autofs

That's kernel 3.16, right - Is it a Singularity-specific modification? Is the auto-mount already active when you start the containers (because CVMFS was already accessed on the host not too long before)?

gmkurtzer commented 7 years ago

I am closing this issue, but let me know if this is still an issue on the latest development branch with the latest kernel updates. Thanks!

AdamSimpson commented 6 years ago

I have run across this exact same behavior with singularity/2.3.1 on SLES11(kernel 3.0.101). I set several autofs NFS mounts in singularity.conf but unless they're actively mounted on the host I get the "Too many levels of symbolic links" error. This failure is enough to cause autofs to mount the directories and so subsequent tries works.

cclerget commented 6 years ago

@AdamSimpson @gmkurtzer It's related to this issue https://patchwork.kernel.org/patch/8775691, Autofs don't deal with mount namespaces without a kernel patch. Inside Singularity mount namespace, every autofs mounts are not ref counted by autofs, that's why if they are not mounted in host, the error appears or not otherwise ... but if host unmount directory for any reasons (except with a timeout of 0 which made autofs completly useless), it will unmount it in Singularity container too since there is no ref count check for the namespace bind mounts and it could be problematic for compute jobs

Singularity could hold reference to a file in each autofs mounts outside from namespace (ala cleanup daemon). To simulate what singularity could do to workaround this issue, could you try the following test ?

$ flock /one/of/autofs/mount/lockfile sleep 10 & $ singularity run image ls -la /one/of/autofs/mount/

If it works, that could be a workaround (a dirty one for sure) to deal with autofs not compatible with mount namespace, by adding an option in singularity.conf to make it aware of each autofs mounts bound in container and run a daemon outside of namespace (like cleanupd) to keep a reference on each configured autofs mounts during lifetime of singularity runs

AdamSimpson commented 6 years ago

@cclerget Thanks for confirming this is a kernel issue. I have done a few simple tests and it appears that running flock on the autofs mount directly(they are readonly so I can't create a lockfile within them) before involking singularity works. running ls on the directory before involking singularity but before the autofs timeout kicks in works just as well though.

I believe that this issue, https://github.com/singularityware/singularity/issues/714, addresses the problem of autofs timing out if the container has it mounted. In some quick testing autofs doesn't timeout if singularity has the directory bind mounted in it which is good, it's just the intial bind mount that's problematic.