Mirantis / cri-dockerd

dockerd as a compliant Container Runtime Interface for Kubernetes
https://mirantis.github.io/cri-dockerd/
Apache License 2.0
1.09k stars 288 forks source link

Support for user namespaces in Kubernetes #74

Open rata opened 2 years ago

rata commented 2 years ago

Hi!

I'm working on the KEP that will be implemented in 1.25 (next k8s release) to support user namespaces. We are creating an implementation for containerd and CRIO, but it will be nice if dockershim implemented that too.

I think there are some limitations docker needs to fix as a pre-requisite for the implementation. IIUC docker only supports a single ID mappings shared by all containers running in the host. There is not support for multiple ID mappings yet. However, for isolation reasons, we are using a different ID mappings for each pod in Kubernetes, which doesn't overlap with mappings of other pods either. So, we will need to use multiple ID mappings for containers, not just a single mapping shared by all containers as docker currently supports.

Some very old comments on the linked moby issue mention that this limitation might be simpler to solve once containerd 1.0 is used, which is already the case. Do you know if this limitation is indeed "easy" to fix now?

It would be great if you can implement userns support for Kubernetes pods in dockershim :)

evol262 commented 2 years ago

We'd love to support this. It is, obviously, dependent on support in moby/docker itself, but I'll see if we can help them out at all.

rata commented 1 year ago

As an update: Kubernetes v1.25 has support for userns with stateless pods and we are aiming to support stateful pods in the coming Kubernetes versions.

@evol262 Any updates from the docker/moby side?

evol262 commented 1 year ago

The docker/moby release process got a little hung up around 22.06, which they're sorting through. @neersighted or @corhere, we briefly discussed this what feels like a long time ago now, but how plausible would either of you guess an effort to get somewhat more dynamic --userns would be?

corhere commented 1 year ago

It's been on the roadmap forever, and with kernels supporting ID-mapped mounts becoming available on LTS distros it is finally becoming practical to implement dynamic user namespaces in Moby. I'd say making it happen is very plausible @evol262.

evol262 commented 1 year ago

I wondered more about how the user interface around it may be structured as something which could potentially take a while to sort out, but that detail is less important ;) @rata, @corhere is a maintainer, so that's a solid vote

rata commented 1 year ago

Hi, we have reworked the k8s implementation to always require idmap mounts.

Since k8s 1.27, the kubelet requests the runtime to use idmappings for the mounts (is part of the mount grpc message). The container runtime should pass these mappings to the OCI runtime and that is basically all to support this.

containerd, runc, CRIO, crun and all are making the changes. It will be great to see this in docker too :)

evol262 commented 1 year ago

@neersighted @corhere ^^

Technically simple! But userns remapping in Moby is still pretty limited for the time being. How difficult would this be to expand?

rata commented 1 year ago

FYI: We are adding support for stateful pods in k8s 1.28, the runtime part is still very simple as it just relies on idmap mounts for the ID handling.

I'm here in case anything is not clear with the KEP or the implementation :)