Closed TristanCacqueray closed 3 years ago
If you run in privileged mode does this work? @giuseppe WDYT?
Running in privileged mode does not seem to be enough as the device is not available.
I think we could add a --security-opt
option to specify the list of paths that must be masked and override the default list.
Something like:
--security-opt masked-paths=/foo/bar:/baz
How about --security-opt unmask-path=/sys/dev
We could also add a --security-opt mask-path=$PATH
to add masked paths - seems useful.
I would like the ability to --security-opt unmask-path=ALL
as well.
I think having unmask and mask would be sufficient.
Would love to get some from the community to grab this.
--device functionality might need special handling to unmask device entries in "/sys/dev", ie if i start container with "--device /dev/dri/renderD128", device's entries in /sys/dev/char and /sys/devices should be unmasked. LibGL looks up device in /sys/dev/char according to my straces, see 226:128 example below:
# ls -la /dev/dri/renderD128
crw-rw-rw-. 1 root render 226, 128 Nov 2 02:32 /dev/dri/renderD128
# ls -lah /sys/dev/char/226:128
lrwxrwxrwx. 1 root root 0 Nov 2 02:32 /sys/dev/char/226:128 -> ../../devices/pci0000:00/0000:00:02.0/drm/renderD128
Running in privileged mode does not seem to be enough as the device is not available.
Intel? It works with --privileged
. See https://github.com/mviereck/x11docker/issues/293
With AMD you might want to use --volume
instead of --device
. Not sure why though.
--device functionality might need special handling to unmask device entries in "/sys/dev", ie if i start container with "--device /dev/dri/renderD128", device's entries in /sys/dev/char and /sys/devices should be unmasked. LibGL looks up device in /sys/dev/char according to my straces, see 226:128 example below:
# ls -la /dev/dri/renderD128 crw-rw-rw-. 1 root render 226, 128 Nov 2 02:32 /dev/dri/renderD128 # ls -lah /sys/dev/char/226:128 lrwxrwxrwx. 1 root root 0 Nov 2 02:32 /sys/dev/char/226:128 -> ../../devices/pci0000:00/0000:00:02.0/drm/renderD128
More simply, if --device
is used, podman should know to not mask /sys/dev
. Knowing what is needed under /sys/dev
might prove problematic, and the end user would end up with just --privileged
instead, which wouldn't otherwise be necessary.
I think that mask/unmask paths could be generally available for finer-grained priviledges, but I don't see the default masked paths documented. It should be a breaking change to update the default masked paths.
I also noticed than masking out /sys/dev
is not enough to prevent tools like lshw
, lspci
, lsusb
to extract information from the host system. Not sure if this is the reason it was masked in the first place.
These are not listed anywhere but in the code, but we will document them when we have the ability to manipulate them
func BlockAccessToKernelFilesystems(privileged, pidModeIsHost bool, g *generate.Generator) {
if !privileged {
for _, mp := range []string{
"/proc/acpi",
"/proc/kcore",
"/proc/keys",
"/proc/latency_stats",
"/proc/timer_list",
"/proc/timer_stats",
"/proc/sched_debug",
"/proc/scsi",
"/sys/firmware",
"/sys/fs/selinux",
"/sys/dev",
} {
g.AddLinuxMaskedPaths(mp)
}
if pidModeIsHost && rootless.IsRootless() {
return
}
for _, rp := range []string{
"/proc/asound",
"/proc/bus",
"/proc/fs",
"/proc/irq",
"/proc/sys",
"/proc/sysrq-trigger",
} {
g.AddLinuxReadonlyPaths(rp)
}
}
}
Would this be reasonable to implement (i can make a pull request here or in a separate issue): if --device is used, treat it similarly to --privileged and exclude '/sys/dev' from masking and mount is read only ?
@awerlang yes the issue is happening with Intel GPU and unprivileged rootless podman. This combo used to work before #6957.
@awerlang yes the issue is happening with Intel GPU and unprivileged rootless podman. This combo used to work before #6957.
Unprivileged mode is under discussion. I quoted you:
Running in privileged mode does not seem to be enough as the device is not available.
If it doesn't work with --privileged
, then it's a different issue, not effected by #6957. Refer to the discussion I linked above.
That it is a different issue, but an additional important information as privileged mode is not even an option to workaround the absence of /sys/dev
. Thus it seems like podman 2.1.1 can no longer run GPU workload.
I am working on adding a mask
and unmask
option to --security-opt
which you can use to specify additional paths you want to mask or any paths that you want to unmask. That should work with --device
when you specify that you want to unmask /sys/dev
. I will have a PR open later today.
Can you give us a specific device you are trying to add?
I did notice that we are masking /sys/dev but not /sys/device, which perhaps we should mask. We could remove these masks when users add an addiitonal device, but if this is for security reasons that we added these masks, then it seems like a fairly risky issue to unmask them for any device --device /dev/fuse for example.
Sadly, I added the mask for /sys/dev and can not find what triggered me adding it. I am sure it was a bugzilla or issue that asked us to mask it. But it looks like it is not masked in Moby at this point.
Here is the bugzilla that triggered this masking. https://bugzilla.redhat.com/show_bug.cgi?id=1772993
@TristanCacqueray
That it is a different issue, but an additional important information as privileged mode is not even an option to workaround the absence of
/sys/dev
. Thus it seems like podman 2.1.1 can no longer run GPU workload.
It seems that the host display (e.g. :0) doesn't work for some reason with open-source drivers, this would be interesting to track down I guess. It does work if you use a nested server (e.g. Weston) though. See the discussion I posted before: https://github.com/mviereck/x11docker/issues/293
Also, unprivileged rootless podman runs gpu workloads for nvidia just fine, it doesn't uses /dev/dri
but /dev/nvidia*
instead.
@rhatdan that change broke multiple podman scenarios on "developer workstation", scenarios that worked before and still work in docker. And these scenarios require podman podman to be started with --device (or equivalent), compromising security to begin with.
@paravz can you give me an example? of a container that this broke?
@mrunalp Masking /sys/dev seems to be causing us issues. Perhaps we should just mask block devices to fix the problem?
@rhatdan any container running the VSCode GUI is likely broken as it seems to require GPU rendering.
Any container with GPU-accelearated GUI or X-windows, ie chrome/puppeteer, etc - see x11docker examples listed here too.
DRI/Render device needs to be forwarded into container to achieve this, ie:
podman run --device '/dev/dri':'/dev/dri':rw ...
or
podman run --device /dev/dri/renderD128 ...
I have run into the same issue today. To figure out the problem, I was cooking up a MWE until I found this issue here. I though, sharing the example could help solving the issue. Here is the Dockerfile
FROM debian:buster
RUN DEBIAN_FRONTEND=noninteractive \
apt-get update && \
apt-get -y install mesa-utils xterm x11-apps xauth
CMD glxgears
Running the following commands should show you the turning-gears demo.
sudo podman build -t glx .
xhost +local:
sudo podman run --rm -ti --volume=$XAUTHORITY:/tmp/.Xauthority --volume=/tmp/.X11-unix:/tmp/.X11-unix --env=DISPLAY=$DISPLAY --env=XAUTHORITY=/tmp/.Xauthority glx
If you add the --device /dev/dri
option, i.e.,
sudo podman run --rm -ti --volume=$XAUTHORITY:/tmp/.Xauthority --volume=/tmp/.X11-unix:/tmp/.X11-unix --env=DISPLAY=$DISPLAY --env=XAUTHORITY=/tmp/.Xauthority --device /dev/dri glx
however, my entire X server freezes for a couple of seconds until glxgears
crashes. Running with the --privileged=true
option works fine for me.
@mrunalp Masking /sys/dev seems to be causing us issues. Perhaps we should just mask block devices to fix the problem?
If you are asking me, the cleanest solution would be to only mask devices with are not mapped into the container.
Thanks @umohnani8!
Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)
/kind bug
Description
When running a GPU application inside a rootless podman container started with
--device /dev/dri
libGL fails to initialize.Steps to reproduce the issue:
Describe the results you received:
Graphical applications like
glxgears
fail to start.Describe the results you expected:
LibGL works and application starts.
Additional information you deem important (e.g. issue happens only occasionally):
It seems like a regression since https://github.com/containers/podman/pull/6957 Starting the container with
--privileged
makes/sys/dev
available, but then for some reason the device file/dev/dri/card0
is not available.