dfki-ric / docker_image_development

Scritps and Dockerfiles to support docker-based, 3D accelerated development and release of docker images
BSD 3-Clause "New" or "Revised" License
6 stars 13 forks source link

Ways to avoid privileged #84

Open pierrewillenbrockdfki opened 1 year ago

pierrewillenbrockdfki commented 1 year ago

Considering that --privileged effectively gives the programs inside the container root privileges on the host(by means of access to the disk and memory devices, even without /dev), are there ways to avoid that? There seem to be ways to allow only a subset of devices to be passed through(gpus, ttys).

moooeeeep commented 3 months ago

I think the main problem why everyone seems to use --privileged is that otherwise hotplugging devices tends to cause problems, e.g., devices that are connected after creating the container can not be found when included with only --device=/dev/.... Not sure about restarting an old container after reboot, maybe the same applies.

There appears to be some solution, e.g., as described here: https://stackoverflow.com/a/66427245 that involve some more involved setup steps though.

planthaber commented 3 months ago

Afaik also the is no direct way to access the disk from the container only by adding --privileged, you still need to hack your way to it.

One option could be to use user remapping, but this will make device access unusable/difficult: https://docs.docker.com/engine/security/userns-remap/ (see last section), but I didn't read deeply into it.

Mounting single files instead of folders will mount the inode, not the "filename", e.g. when you delete/recreate the file on the host, the old file (inode) is still mounted, the container sees the old file. I thinks its the same for device files.

Rather than mounting the complete dev folder (that gives access to all devices, you could also just mount e.g. /dev/input).

I also read the tip to create a folder with symlinks to just the needed files of a folder that contains secret files and mount the folder with the symlinks only. This could also work for devices. Also we could test to direct-mount symlinks to devices, no idea if that works (the inode of the symlink does not change if the file is changed, right).

But --privileged is always needed when accessing hardware. But you can replace --privileged with --cap-add to only allow what you really need: https://docs.docker.com/engine/reference/run/#runtime-privilege-and-linux-capabilities

pierrewillenbrockdfki commented 3 months ago

With privileged, nothing stops you from creating and accessing device files, as you like. That can be for example /dev/nvme*(which you then can freely access). The only thing you need to know about a specific device of interest is the major/minor device number, which can be found in /sys. Or you just guess, because the device classes have fixed major numbers and have predictable minor numbers. /dev/sda is always (8,0), /dev/sda1 is (8,1), /dev/sdb is (8,16) and so on.

Normal users cannot create device files, for this reason. Maybe it is already sufficient to not allow MKNOD capability and not mount /dev into the container(otherwise, the in-container root has full access to all the existing devices again). Then one would have to populate the required device files using other means.

planthaber commented 3 months ago

I wonder if we could create a local "devices" folder on the hoost for mounting to /dev in the container, and then either use either mknod( with updates ) or symlinks, both with matching permissions to the devices to control access.

planthaber commented 3 months ago

When I added the SSH agent option I realized that it is usable in devel, but not release, I thinks this is also because of the fact that die devel container uses the UID of the host user.

So I guess, that you can also avoid privileged when you use devel images and the user running the image has access rights to the mounted devices. For releases you can remove this line from the Dockerfile:

https://github.com/dfki-ric/docker_image_development/blob/master/image_setup/03_release_image/Dockerfile#L14

that makes the entrypoint run the uid init (https://github.com/dfki-ric/docker_image_development/blob/master/image_setup/01_base_images/entrypoint.bash#L75).

Or run the uid script manually: run /opt/init_user_id.bash and exit/reenter the container to use the new uid.

SSH Option: https://github.com/dfki-ric/docker_image_development/commit/c5e4997fb3aa2dc025ce4180d5f2ead2c0f1d50f