cncf-tags / container-device-interface

Apache License 2.0
190 stars 36 forks source link

[Extension] Allow AdditionalGIDs to be specified in CDI specification #175

Closed elezar closed 1 week ago

elezar commented 8 months ago

Consider the following conditions. A device nodes with the following properties

crw-rw---- 1 root video 226, 0 Sep 12 13:10 /dev/dri/card0
crw-rw---- 1 root render 226, 128 Sep 12 13:10 /dev/dri/renderD128

If we inject these two devices into a container using the command line:

docker run --rm -ti --device /dev/dri/card0 --device /dev/dri/renderD128 ubuntu

we see:

$ ls -aln /dev/dri
total 0
drwxr-xr-x 2 0   0       80 Nov  1 15:42 .
drwxr-xr-x 6 0   0      380 Nov  1 15:42 ..
crw-rw---- 1 0  44 226,   0 Nov  1 15:42 card0
crw-rw---- 1 0 109 226, 128 Nov  1 15:42 renderD128

Note that both device nodes have their file mode set to xx0 meaning that users not in the video or render group can access the device. This means that if we start the container with a non-root user, we would no be able to read from or write to the devices.

$ docker run --rm -ti --device /dev/dri/card0 --device /dev/dri/renderD128 -u 1000:1000 ubuntu
$ cat /dev/dri/card0
cat: /dev/dri/card0: Permission denied

A workaround is to add the numeric group ID for the required group(s) (video or render in this case) to the docker commandline:

docker run --rm -ti --device /dev/dri/card0 --device /dev/dri/renderD128 -u 1000:1000 \
    --group-add=44 --group-add=109  ubuntu

Meaning that the device nodes are readable by the container process.

What Docker does is add the IDs for the groups to the

Process.User.AdditionalGIDs

field in the OCI Runtime Specification. As confirmed by inspecting the config.json generated for the container:

{
  "ociVersion": "1.1.0-rc.2",
  "process": {
    "terminal": true,
    "consoleSize": {
      "height": 57,
      "width": 133
    },
    "user": {
      "uid": 1000,
      "gid": 1000,
      "additionalGids": [
        1000,
        44,
        109
      ]
    },

The proposal here is that we extend the CDI specification to allow AdditionalGIDs to be specified as ContainerEdits:

// ContainerEdits are edits a container runtime must make to the OCI spec to expose the device.
type ContainerEdits struct {
    Env         []string      `json:"env,omitempty"`
    DeviceNodes []*DeviceNode `json:"deviceNodes,omitempty"`
    Hooks       []*Hook       `json:"hooks,omitempty"`
    Mounts      []*Mount      `json:"mounts,omitempty"`
        RequiredGroupIDs   []uint32.  `json:"requiredGroupIDs,omitempty"`
}

We can decide whether we want to add a struct type instead to allow for future extensions.

With this extension, the modifications to the OCI Spec will include appending the set of RequiredGroupIDs to the Process.User.AdditionalGIDs slice.

elezar commented 8 months ago

/cc @klueska

elezar commented 8 months ago

/cc @kad @klihub

mythi commented 5 months ago

@elezar The problem was solved for the CRI runtimes earlier https://kubernetes.io/blog/2021/11/09/non-root-containers-and-devices/

mythi commented 5 months ago

there's also a potential conflict/overlap: if device_ownership_from_security_context boolean is enabled and a device plugin picks up the device host GID into AdditionalGIDs via CDI, the host GID does not exist in the container anymore. Right?

elezar commented 5 months ago

@mythi thanks for the feedback and for the linked blog post. The addition of the GIDs to the list comes from the workaround that was required to get this working on non-CRI-based runtimes such as podman or docker. There is also no device_ownership_from_security_context that can be enabled.

There is also the question of rootless containers which as far as I was able to tell from runc's implementation bind mount devices instead of creating the device nodes using mknod.

Maybe this indicates that we need to reopen this issue (possibly reverting #179) to investigate this further.

From our testing the existence of the GID in the container does not seem to affect behaviour, but I could spend more time on this.

mythi commented 5 months ago

The addition of the GIDs to the list comes from the workaround that was required to get this working on non-CRI-based runtimes such as podman or docker.

Yeah I thought so too based on the issue description but wanted share what containerd/CRI-O can do if you had plans to get this to your device plugin too.

I played with rootless containers and devices just before the holidays. See my "summary" in https://github.com/containers/podman/issues/20974#issuecomment-1862599848. I did not try rootless docker/runc but at least crun w/ podman allows you to preserve the host GID using --group-add keep-groups. AFAIK, there's an old RFC topic for runtime-spec for how this should be handled officially but I've not studied where the conversation is atm.

github-actions[bot] commented 1 month ago

This issue is stale because it has been open 90 days with no activity. This issue will be closed in 30 days unless new comments are made or the stale label is removed.

github-actions[bot] commented 1 week ago

This issue was automatically closed due to inactivity.