docker / for-linux

Docker Engine for Linux
https://docs.docker.com/engine/installation/
754 stars 85 forks source link

Allow FUSE functionality by default #321

Open rustyx opened 6 years ago

rustyx commented 6 years ago

Expected behavior

Mounting FUSE filesystems should work out-of-the box, because it is safe. It fits within the idea of a containerized app.

Actual behavior

An attempt to mount a FUSE filesystem fails with:

fuse: device not found, try 'modprobe fuse' first or fuse: failed to exec fusermount: No such file or directory

The only way to fix it is to run the container with additional permissions:

--cap-add SYS_ADMIN --device /dev/fuse

This makes it very difficult to run FUSE inside Docker because it is often all but impossible to run with additional flags in a managed environment.

Steps to reproduce the behavior

git clone https://github.com/rustyx/fuse-hello.git
docker build fuse-hello -t hello
docker run -it hello
docker run -it --device /dev/fuse hello
docker run -it --cap-add SYS_ADMIN --device /dev/fuse hello

Output of docker version:

Client:
 Version:       18.01.0-ce
 API version:   1.35
 Go version:    go1.9.2
 Git commit:    03596f51b1
 Built: Thu Jan 11 22:29:41 2018
 OS/Arch:       windows/amd64
 Experimental:  false
 Orchestrator:  swarm

Server:
 Engine:
  Version:      18.05.0-ce
  API version:  1.37 (minimum version 1.12)
  Go version:   go1.10.1
  Git commit:   f150324
  Built:        Wed May  9 22:20:42 2018
  OS/Arch:      linux/amd64
  Experimental: false
andersjohansenange commented 5 years ago

Strongly agree this would be a great feature. It's fairly common to abstract various services via a FUSE driver. If mounting one requires root-like capabilities it encourages lax security.

justincormack commented 5 years ago

The kernel requires SYS_ADMIN we can't change this.

dandelionred commented 5 years ago

@justincormack What about this FUSE Gets User Namespace Support With Linux 4.18

Just a memo below on how it doesn't work currently.

Ubuntu 18.04 with stock hwe kernel 4.18.0-18, docker 18.09.5.

docker run --rm -it --device=/dev/fuse ubuntu:18.04
apt update
apt install -y fuseiso wget
adduser --disabled-password --gecos '' test
cd /home/test
su test
mkdir mnt
wget https://cdn.openbsd.org/pub/OpenBSD/6.5/amd64/cd65.iso
fuseiso cd65.iso mnt
>>>>>>> fusermount: mount failed: Operation not permitted
exit
addgroup fuse
usermod -aG fuse test
su test
fuseiso cd65.iso mnt
>>>>>>> fusermount: mount failed: Operation not permitted
exit
fuseiso cd65.iso mnt
>>>>>>> fusermount: mount failed: Operation not permitted

So atm it doesn't work for

zbyte64 commented 5 years ago

Someone correct me I am wrong, trying to wrap my head around the limitations here.

The user namespace means we could do the current method more securely, perhaps without adding the SYS_ADMIN capabilities, but would still require the fuse device to be passed through.

When any mount occurs in a container it is also modifying the host mounts, hence the need for host cooperation. This prevents containers with FUSE from being used on Windows and OSX hosts.

If a container's OS was modified to intercept file system calls to emulate it's own FUSE then those FUSE mounts would not be accessible from the host. Is this even possible?

Ciantic commented 5 years ago

This prevents containers with FUSE from being used on Windows and OSX hosts.

Fuse mounting inside containers work just fine with Docker for Windows, when passing the same flags: --cap-add SYS_ADMIN --device /dev/fuse.

I think the parent poster would want it to just work without any flags?

In my opinion the SYS_ADMIN is the one we shouldn't need. If only --device /dev/fuse were required.

omeid commented 5 years ago

@zbyte64

When any mount occurs in a container it is also modifying the host mounts, hence the need for host cooperation. This prevents containers with FUSE from being used on Windows and OSX hosts.

Well, as of 4.18, you have user namespace mounts for fuse which means you shouldn't need to change the host mounts and thus wouldn't need SYS_ADMIN.

cometta commented 5 years ago

@omeid , what do you meant by 4.18 ? the latest version of blobfuse is 1.0.3 ? which version of blobfuse are you using to run as non root?

dandelionred commented 5 years ago

@cometta He means this https://github.com/docker/for-linux/issues/321#issuecomment-487955090

miketzian commented 4 years ago

+1 on this, requiring SYS_ADMIN is basically a non-starter for us, though the extra device shouldn't be an issue (assuming 4.18+ kernels). Can this get triaged ?

ryanlamore commented 4 years ago

The ability to run fuse without SYS_ADMIN has been enabled for since August, 2018, and yet there hasn't been much traction on this ticket. Running in privilege mode in production should scare most security teams! Is there anything we can do to get more traction on this story?

1zg12 commented 4 years ago

SYS_ADMIN is quite a powerful role, if there is a way to mount without that role, it could avoid a lot risk.

yyb196 commented 4 years ago

I think its all about the linux kernel which need to provide the ability to mount without the sys_admin capability, isn't in the scope of docker

norpol commented 4 years ago

I think its all about the linux kernel which need to provide the ability to mount without the sys_admin capability, isn't in the scope of docker

Checkout this earlier comment, Linux kernel appears to have added namespace support for fuse in 4.18.

cpuguy83 commented 4 years ago

Has anybody actually tried to do this? I've added mount to my seccomp allow list and still get permission denied on mount:

/bin/fusermount: mount failed: Operation not permitted
panic: fusermount exited with code 256

goroutine 1 [running]:
main.main()
    /Users/cpuguy83/go/src/github.com/cpuguy83/tarfs/cmd/tarfsd/main.go:46 +0x697
root@6bd1a24bcd1a:/# uname -a
Linux 6bd1a24bcd1a 4.19.76-linuxkit #1 SMP Thu Oct 17 19:31:58 UTC 2019 armv7l GNU/Linux

Something tells me there is much more to this than just allowing mount without CAP_SYS_ADMIN

omeid commented 4 years ago

@cpuguy83 Make sure you have unprivileged_userns_clone kernel param set.

cpuguy83 commented 4 years ago

@omeid That's a debian specific kernel param for enabling (or rather disabling?) userns for unprivileged users, I think?

omeid commented 4 years ago

Debian, Archlinux, too. Check your kernel documentation, and also make sure it is compiled with .CONFIG_USER_NS.

cpuguy83 commented 4 years ago

@omeid I can create a userns just fine, what I can't do is mount in the userns w/o CAP_SYS_ADMIN. I'm attempting to do this by taking the default seccomp profile and adding unshare and mount to the allow list.

pmjohann commented 4 years ago

Any updates on this since?

skaldesh commented 4 years ago

I need this as well, and giving my containers SYS_ADMIN permissions just for FUSE is not an option

juergbi commented 4 years ago

The following steps work for me on Fedora 32 to use FUSE in a Docker container without --privileged or --cap-add SYS_ADMIN:

Depending on the uid mapping Docker uses, this can be considered secure as long as you trust the kernel implementation of unprivileged user namespaces (and FUSE). It would be great if this was supported by default or at least as an easy-to-use alternative profile.

Side-note: To allow mounting tmpfs in the user namespace in the container, Fedora additionally requires --security-opt label:type:container_userns_t (SELinux).

JuniorJPDJ commented 3 years ago

@juergbi why can't you use sshfs directly in docker? what's stopping you from doing so after patching seccomp profile? I'm trying to understand the issue and also need FUSE in docker.

ssokolow commented 3 years ago

why can't you use sshfs directly in docker? what's stopping you from doing so after patching seccomp profile?

I'm not sure I understand your question. (The answer seems too obvious to me, so I must be misinterpreting it.)

FUSE is the kernel API that sshfs is built on top of, and Docker doesn't run a second kernel inside the container, so the fuse module must be loaded, access to /dev/fuse is necessary for the sshfs binary to communicate with the kernel, and anything that interferes with sshfs's ability to perform the mount operation must be disabled.

juergbi commented 3 years ago

@juergbi why can't you use sshfs directly in docker? what's stopping you from doing so after patching seccomp profile? I'm trying to understand the issue and also need FUSE in docker.

Mounting anything (FUSE and other filesystems) requires CAP_SYS_ADMIN privileges even without seccomp restrictions. Outside Docker, unprivileged users can run sshfs with the help of the setuid-root helper binary fusermount. However, in a Docker container setuid fusermount is not supported and hence, sshfs fails unless the Docker container is privileged.

The mentioned unshare command grants CAP_SYS_ADMIN privileges in new user and mount namespaces. This doesn't provide any additional access to the host system, however, it allows mount operations in that new mount namespace. With Linux 4.18 and later, FUSE mounts are allowed in that new mount namespace as well. So sshfs can work inside the new namespaces.

Other container engines may create an unprivileged user namespace as part of container startup, which may allow mounts without the extra unshare step. However, Docker doesn't work that way with its system daemon.

JuniorJPDJ commented 3 years ago

why can't you use sshfs directly in docker? what's stopping you from doing so after patching seccomp profile?

I'm not sure I understand your question. (The answer seems too obvious to me, so I must be misinterpreting it.)

FUSE is the kernel API that sshfs is built on top of, and Docker doesn't run a second kernel inside the container, so the fuse module must be loaded, access to /dev/fuse is necessary for the sshfs binary to communicate with the kernel, and anything that interferes with sshfs's ability to perform the mount operation must be disabled.

I know, I ment he runs sshfs inside unshare'd namespace inside docker.

chalbersma commented 3 years ago

I'm trying to understand the issue and also need FUSE in docker.

Slightly unrelated. But I also use FUSE in docker to mount ISO files as a non-root user.

dotslash commented 3 years ago

The following steps work for me on Fedora 32 to use FUSE in a Docker container without --privileged or --cap-add SYS_ADMIN:

  • Patch the seccomp profile to drop the restriction on clone(2) namespace flags and allow mount(2) and umount(2): https://gist.github.com/juergbi/44b0e7aaa50742f996eed0693e053cda This is a patch for profiles/seccomp/default.json as available in the docker/moby repositories.
  • Ensure the fuse module is loaded
  • Run the Docker container with the options --device /dev/fuse --security-opt seccomp=/path/to/fuse.json
  • In the Docker container run unshare -c --keep-caps -m to open a shell in new unprivileged user and mount namespaces.
  • In that new shell it's then possible to mount and use FUSE. E.g., sshfs user@host:directory /mnt

Depending on the uid mapping Docker uses, this can be considered secure as long as you trust the kernel implementation of unprivileged user namespaces (and FUSE). It would be great if this was supported by default or at least as an easy-to-use alternative profile.

Side-note: To allow mounting tmpfs in the user namespace in the container, Fedora additionally requires --security-opt label:type:container_userns_t (SELinux).

@juergbi : I was able to replicate this setup on ubuntu 18.04. I used -r instead of -c because the util-linux shipped in ubuntu18.04 does not have -c

Fuse works :) But i want to be able to install ubuntu packages in the unshared shell (apt-get install foo) . I get this error

W: chown to _apt:root of directory /var/cache/apt/archives/partial failed - SetupAPTPartialDirectory (22: Invalid argument)
W: chmod 0700 of directory /var/cache/apt/archives/partial failed - SetupAPTPartialDirectory (1: Operation not permitted)
W: chown to _apt:root of directory /var/lib/apt/lists/auxfiles failed - SetupAPTPartialDirectory (22: Invalid argument)
W: chmod 0700 of directory /var/lib/apt/lists/auxfiles failed - SetupAPTPartialDirectory (1: Operation not permitted)

Do you have any suggestions to work around this?

Nicba1010 commented 2 years ago

Any progress on this?

acidjazz commented 1 year ago

The following steps work for me on Fedora 32 to use FUSE in a Docker container without --privileged or --cap-add SYS_ADMIN:

  • Patch the seccomp profile to drop the restriction on clone(2) namespace flags and allow mount(2) and umount(2): https://gist.github.com/juergbi/44b0e7aaa50742f996eed0693e053cda This is a patch for profiles/seccomp/default.json as available in the docker/moby repositories.
  • Ensure the fuse module is loaded
  • Run the Docker container with the options --device /dev/fuse --security-opt seccomp=/path/to/fuse.json
  • In the Docker container run unshare -c --keep-caps -m to open a shell in new unprivileged user and mount namespaces.
  • In that new shell it's then possible to mount and use FUSE. E.g., sshfs user@host:directory /mnt

Depending on the uid mapping Docker uses, this can be considered secure as long as you trust the kernel implementation of unprivileged user namespaces (and FUSE). It would be great if this was supported by default or at least as an easy-to-use alternative profile.

Side-note: To allow mounting tmpfs in the user namespace in the container, Fedora additionally requires --security-opt label:type:container_userns_t (SELinux).

can we possibly get a docker image of this config?

quantumsheep commented 1 year ago

The following steps work for me on Fedora 32 to use FUSE in a Docker container without --privileged or --cap-add SYS_ADMIN:

  • Patch the seccomp profile to drop the restriction on clone(2) namespace flags and allow mount(2) and umount(2): https://gist.github.com/juergbi/44b0e7aaa50742f996eed0693e053cda This is a patch for profiles/seccomp/default.json as available in the docker/moby repositories.
  • Ensure the fuse module is loaded
  • Run the Docker container with the options --device /dev/fuse --security-opt seccomp=/path/to/fuse.json
  • In the Docker container run unshare -c --keep-caps -m to open a shell in new unprivileged user and mount namespaces.
  • In that new shell it's then possible to mount and use FUSE. E.g., sshfs user@host:directory /mnt

Depending on the uid mapping Docker uses, this can be considered secure as long as you trust the kernel implementation of unprivileged user namespaces (and FUSE). It would be great if this was supported by default or at least as an easy-to-use alternative profile.

Side-note: To allow mounting tmpfs in the user namespace in the container, Fedora additionally requires --security-opt label:type:container_userns_t (SELinux).

Is there any security implications doing so ? I want to allow untrusted users to access FUSE for rclone mount but it would be great if they can't access the host's filesystem.

MetalPinguinInc commented 11 months ago

The following steps work for me on Fedora 32 to use FUSE in a Docker container without --privileged or --cap-add SYS_ADMIN:

* Patch the seccomp profile to drop the restriction on `clone(2)` namespace flags and allow `mount(2)` and `umount(2)`: https://gist.github.com/juergbi/44b0e7aaa50742f996eed0693e053cda This is a patch for `profiles/seccomp/default.json` as available in the docker/moby repositories.

* Ensure the `fuse` module is loaded

* Run the Docker container with the options `--device /dev/fuse --security-opt seccomp=/path/to/fuse.json`

* In the Docker container run `unshare -c --keep-caps -m` to open a shell in new unprivileged user and mount namespaces.

* In that new shell it's then possible to mount and use FUSE. E.g., `sshfs user@host:directory /mnt`

Depending on the uid mapping Docker uses, this can be considered secure as long as you trust the kernel implementation of unprivileged user namespaces (and FUSE). It would be great if this was supported by default or at least as an easy-to-use alternative profile.

Side-note: To allow mounting tmpfs in the user namespace in the container, Fedora additionally requires --security-opt label:type:container_userns_t (SELinux).

@juergbi Thanks to this reply I am 99% of the way to a working set up of using sshfs inside Kubernetes, however I cannot seem to write to the sshfs mount. I have been able to replicate this on my local machine (without using Docker/Kubernetes). Mounting over sshfs works in an unshared shell, but I cannot write to the mount. Using the exact same mounting command outside of the unshared shell gives me write access, so I am sure it is not an issue on the remote server. Any suggestions how to fix this?

ysn2233 commented 10 months ago

any update?

juangburgos commented 10 months ago

Any progress?

AkechiShiro commented 10 months ago

I don't think it's helpful to ping everyone for progress update here, if there is any progress, it will be reported by the ones making progress, in either this issue or a PR (Pull Request).

For future readers, please refrain from commenting every week on any updates, as this is inappropriate behavior, this is open source software, if you really want an update, then make it, create a PR fixing the issue, else wait for the update.