apptainer / apptainer

Apptainer: Application containers for Linux
https://apptainer.org
Other
1.07k stars 132 forks source link

--writable-tmpfs requires overlay kernel support: your kernel doesn't support it #876

Open tashrifbillah opened 1 year ago

tashrifbillah commented 1 year ago

Hi, my CentOS 7 Linux is 3.10.0-1160.76.1.el7.x86_64 . My apptainer version is 1.1.3-1.el7.

singularity shell --writable-tmpfs my.image.sif

Gives:

FATAL: --writable-tmpfs requires overlay kernel support: your kernel doesn't support it

Full disclosure: the --writable-tmpfs flag worked when singularity was singularity and probably CentOS was 3.8.0. The way around I am following to edit image's files is to bind mount a volume into my desired directory. However, I would like to know if there is a real solution to the above issue.

DrDaveD commented 1 year ago

You didn't say how you installed it. If you have a default installation from rpm or source, there's now no setuid-root. In that case on CentOS7 the --writable-tmpfs option should be able to use the fuse-overlayfs command if it is available and unprivileged user namespaces are enabled. If it is not available you should be seeing an INFO message telling you that's a likely cause of problems. Have you shown all the output from apptainer? If you don't mind setuid-root you could install the apptainer-suid rpm or compile with the --with-suid option, or if you don't want setuid-root, make sure that fuse-overlayfs is installed.

tashrifbillah commented 1 year ago

You didn't say how you installed it.

yum update Singularity

I have fuse-overlayfs

[root@rc-predict-dev singularity]# yum list installed | grep fuse-overlayfs
fuse-overlayfs.x86_64                   0.7.2-6.el7_8                  installed

Have you shown all the output from apptainer?

The above is all! Can I get a second comment?

DrDaveD commented 1 year ago

That's odd. Please attach the output from singularity -d shell --writable-tmpfs my.image.sif

tashrifbillah commented 1 year ago
Any clue? ``` DEBUG [U=0,P=69742] persistentPreRun() Apptainer version: 1.1.3-1.el7 DEBUG [U=0,P=69742] persistentPreRun() Parsing configuration file /etc/apptainer/apptainer.conf DEBUG [U=0,P=69742] SetBinaryPath() Setting binary path to /usr/libexec/apptainer/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/opt/puppetlabs/bin:/root/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin DEBUG [U=0,P=69742] SetBinaryPath() Using that path for all binaries DEBUG [U=0,P=69742] handleConfDir() /root/.apptainer already exists. Not creating. DEBUG [U=0,P=69742] execStarter() Saving umask 0022 for propagation into container DEBUG [U=0,P=69742] execStarter() Checking for encrypted system partition DEBUG [U=0,P=69742] Init() Image format detection DEBUG [U=0,P=69742] Init() Check for sandbox image format DEBUG [U=0,P=69742] Init() sandbox format initializer returned: not a directory image DEBUG [U=0,P=69742] Init() Check for sif image format DEBUG [U=0,P=69742] Init() sif image format detected DEBUG [U=0,P=69742] SetContainerEnv() Forwarding MANPATH environment variable DEBUG [U=0,P=69742] SetContainerEnv() Forwarding XDG_SESSION_ID environment variable DEBUG [U=0,P=69742] SetContainerEnv() Forwarding HOSTNAME environment variable DEBUG [U=0,P=69742] SetContainerEnv() Forwarding SHELL environment variable DEBUG [U=0,P=69742] SetContainerEnv() Forwarding TERM environment variable DEBUG [U=0,P=69742] SetContainerEnv() Forwarding HISTSIZE environment variable VERBOSE [U=0,P=69742] SetContainerEnv() Not forwarding SINGULARITY_DOCKER_USERNAME environment variable DEBUG [U=0,P=69742] SetContainerEnv() Forwarding USER environment variable DEBUG [U=0,P=69742] SetContainerEnv() Forwarding LS_COLORS environment variable VERBOSE [U=0,P=69742] SetContainerEnv() Not forwarding SINGULARITY_TMPDIR environment variable DEBUG [U=0,P=69742] SetContainerEnv() Forwarding MAIL environment variable DEBUG [U=0,P=69742] SetContainerEnv() Forwarding PWD environment variable DEBUG [U=0,P=69742] SetContainerEnv() Forwarding LANG environment variable DEBUG [U=0,P=69742] SetContainerEnv() Forwarding HISTCONTROL environment variable DEBUG [U=0,P=69742] SetContainerEnv() Forwarding SHLVL environment variable DEBUG [U=0,P=69742] SetContainerEnv() Forwarding LOGNAME environment variable DEBUG [U=0,P=69742] SetContainerEnv() Forwarding LESSOPEN environment variable DEBUG [U=0,P=69742] SetContainerEnv() Forwarding _ environment variable DEBUG [U=0,P=69742] SetContainerEnv() Forwarding OLDPWD environment variable VERBOSE [U=0,P=69742] SetContainerEnv() Not forwarding APPTAINER_DEBUG environment variable DEBUG [U=0,P=69742] SetContainerEnv() Forwarding USER_PATH environment variable VERBOSE [U=0,P=69742] SetContainerEnv() Setting HOME=/root VERBOSE [U=0,P=69742] SetContainerEnv() Setting PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin DEBUG [U=0,P=69742] init() Use starter binary /usr/libexec/apptainer/bin/starter VERBOSE [U=0,P=69742] print() Set messagelevel to: 5 VERBOSE [U=0,P=69742] init() Starter initialization VERBOSE [U=0,P=69742] is_suid() Check if we are running as setuid DEBUG [U=0,P=69742] read_engine_config() Read engine configuration DEBUG [U=0,P=69742] init() Wait completion of stage1 DEBUG [U=0,P=69746] set_parent_death_signal() Set parent death signal to 9 VERBOSE [U=0,P=69746] init() Spawn stage 1 DEBUG [U=0,P=69746] startup() apptainer runtime engine selected VERBOSE [U=0,P=69746] startup() Execute stage 1 DEBUG [U=0,P=69746] StageOne() Entering stage 1 DEBUG [U=0,P=69746] findOnPath() Found "squashfuse_ll" at "/usr/libexec/apptainer/bin/squashfuse_ll" DEBUG [U=0,P=69746] findOnPath() Found "fuse2fs" at "/sbin/fuse2fs" DEBUG [U=0,P=69746] findOnPath() Found "fuse-overlayfs" at "/bin/fuse-overlayfs" DEBUG [U=0,P=69746] InitImageDrivers() squashfuse_ll supports -o uid DEBUG [U=0,P=69746] InitImageDrivers() Setting ImageDriver to fuseapps DEBUG [U=0,P=69746] prepareRootCaps() Root full capabilities DEBUG [U=0,P=69746] prepareAutofs() Found "/proc/sys/fs/binfmt_misc" as autofs mount point DEBUG [U=0,P=69746] prepareAutofs() Could not keep file descriptor for bind path /etc/localtime: no mount point DEBUG [U=0,P=69746] prepareAutofs() Could not keep file descriptor for bind path /etc/hosts: no mount point DEBUG [U=0,P=69746] prepareAutofs() Could not keep file descriptor for home directory /root: no mount point DEBUG [U=0,P=69746] prepareAutofs() Could not keep file descriptor for current working directory /root: no mount point DEBUG [U=0,P=69746] Init() Image format detection DEBUG [U=0,P=69746] Init() Check for sandbox image format DEBUG [U=0,P=69746] Init() sandbox format initializer returned: not a directory image DEBUG [U=0,P=69746] Init() Check for sif image format DEBUG [U=0,P=69746] Init() sif image format detected FATAL [U=0,P=69746] StageOne() --writable-tmpfs requires overlay kernel support: your kernel doesn't support it VERBOSE [U=0,P=69742] wait_child() stage 1 exited with status 255 ```
DrDaveD commented 1 year ago

Has the configuration in apptainer.conf been changed from the default? Is use underlay disabled for some reason?

DrDaveD commented 1 year ago

Nevermind about underlay, that's not relevant. But is the configuration standard?

tashrifbillah commented 1 year ago

How do I find if the configuration is standard? I can attest that no human has modified yum installed/updated configuration.

DrDaveD commented 1 year ago

Ok then it should still be standard. You can confirm with rpm -qV apptainer which should have no output if nothing has changed.

Is there anything unusual about the environment? Is it running nested in another container system, for instance, or in anything other than a standard shell?

I expect the debug output to show "Overlay requested while in user namespace" but since it is not there, somehow the userNS flag must not be getting set. But user namespaces are available according to the other debug output, and it would have died earlier if they were not.

tashrifbillah commented 1 year ago

You can confirm with rpm -qV apptainer which should have no output

Confirmed.

Is it running nested in another container system Yes.

somehow the userNS flag must not be getting set.

How do we check?

Thank you for your time.

DrDaveD commented 1 year ago

Tell me more details about the container system that it is layered inside.

Also, try running unshare -r and inside there run ls -l /proc/$$/ns and show me the output. I'm not sure but I think the setting of the internal userNS flag is related to that.

tashrifbillah commented 1 year ago
[root@rc ~]# unshare -r
unshare: unshare failed: Invalid argument

inside there

Where?

tashrifbillah commented 1 year ago

Does this hint anything?

[root@rc ~]# singularity shell --userns my.image.sif
INFO   : A system administrator may need to enable user namespaces, install
INFO   :   apptainer-suid, or compile with ./mconfig --with-suid
ERROR  : Failed to create user namespace: user namespace disabled
DrDaveD commented 1 year ago

Oh! Are you running everything as root?

Usually singularity/apptainer are run by unprivileged users. I assumed that's what you were doing, and deduced that you would only have been able to get as far as you did if unprivileged user namespaces were enabled.

If I disable unprivileged user namespaces on my el7 machine and run as root, overlay works fine. Are you running inside a container system that gives you only the appearance of root somehow but not a full root? What are the details of the container system you are running under?

When unshare -r works, it gives you a shell prompt, that's what I meant by "inside there".

Running apptainer without setuid-root, now the default, requires unprivileged user namespaces. You will need to either enable them or install the additional apptainer-suid package.

tashrifbillah commented 1 year ago

requires unprivileged user namespaces.

:o Since the birth of Singularity, we have been using it as root for building images. Obliviously, we also run it as root. I have full root. Trying as non-root in a bit ...

tashrifbillah commented 1 year ago
[tango@rc ~]$ singularity shell --writable-tmpfs my.image.sif
INFO:    Detected Singularity user configuration directory
INFO   : A system administrator may need to enable user namespaces, install
INFO   :   apptainer-suid, or compile with ./mconfig --with-suid
ERROR  : Failed to create user namespace: user namespace disabled

Did not succeed either.

requires unprivileged user namespaces. You will need to either enable them

When you get a second, please advise on how to enable 'them'.

DrDaveD commented 1 year ago

https://apptainer.org/docs/admin/main/user_namespace.html#rhel-centos-7

tashrifbillah commented 1 year ago

Did not succeed either.

It just occurred to me that the above is expected to fail. Singularity images are built as root. Why would a non-root be allowed to change them after!

If I disable unprivileged user namespaces on my el7 machine and run as root, overlay works fine.

I'll try this following your link.

tashrifbillah commented 1 year ago

user.max_net_namespaces = 0

It was already set in my system. In addition, I have set user.max_user_namespaces = 15000 according to your link. After that, I can use --writable-tmpfs flag as a non-root user. But it is no good as my image's files were created as root and so the non-root user cannot modify them.

You will need to either enable them or install the additional apptainer-suid package.

So is my only option going to be install the additional apptainer-suid package ?

DrDaveD commented 1 year ago

user.max_net_namespaces = 0

It was already set in my system. In addition, I have set user.max_user_namespaces = 15000 according to your link. After that, I can use --writable-tmpfs flag as a non-root user. But it is no good as my image's files were created as root and so the non-root user cannot modify them.

You will need to either enable them or install the additional apptainer-suid package.

So is my only option going to be install the additional apptainer-suid package ?

An unprivileged user with apptainer-1.1 and no apptainer-suid package should be able to overlay changes onto a container using the --fakeroot option. I didn't mention that earlier because you hadn't gotten that far.

The next problem you are likely to run into is that the default maximum amount of changes you can make with --writable-tmpfs is set by the sessiondir max size option in /etc/apptainer/apptainer.conf to only 16 MB. You can increase that value if you don't mind allowing containers to use that much ram, or you can use the --overlay path option instead of --writable-tmpfs to save the changes in a directory or ext3 filesystem. Another advantage of using --overlay is that you can make your changes with --fakeroot and then start the container again without --fakeroot after no more changes are needed, by appending :ro to the end of the path to make it read-only.