genuinetools / img

Standalone, daemon-less, unprivileged Dockerfile and OCI compatible container image builder.
https://blog.jessfraz.com/post/building-container-images-securely-on-kubernetes/
MIT License
3.9k stars 231 forks source link

Docker run: newuidmap: open of uid_map failed: Permission denied #191

Closed mpfen closed 5 years ago

mpfen commented 5 years ago

Hey, I'm trying to run img in docker on Ubuntu 18.04 and I'm getting the following error:

$ docker run --rm -it --name img --volume $(pwd):/home/user/src:ro --workdir /home/user/src --volume "${HOME}/.docker:/root/.docker:ro" --privileged r.j3ss.co/img build -t user/myimage .
newuidmap: open of uid_map failed: Permission denied
nsenter: failed to use newuidmap: Invalid argument
nsenter: failed to sync with parent: SYNC_USERMAP_ACK: got 255: Invalid argument

Any advise?

brad-jones commented 5 years ago

Also getting this on Ubuntu 16.04.5 LTS Ensured uidmap & libseccomp-dev packages have been installed. /proc/sys/kernel/unprivileged_userns_clone is also enabled.

I am sure I had this working though on another 18.04 machine. Has something changed recently to introduce this bug? Might start trying older versions...

brad-jones commented 5 years ago

So if I go back to v0.5.1 this error no longer occurs.

AkihiroSuda commented 5 years ago

I confirmed I can hit this issue with Ubuntu 18.04 kernel 4.15.0-39-generic #42-Ubuntu SMP Tue Oct 23 15:48:01 UTC 2018.

The issue should be related to https://github.com/genuinetools/img/pull/184, but I think the PR had been working on Ubuntu 18.04 at that time.. (EDIT: my bad, probably I didn't test it locally and didn't notice that the Dockerfile isn't tested on Travis..)

cc @giuseppe

AkihiroSuda commented 5 years ago

https://github.com/genuinetools/img/blob/7c58387804c0897b7086ee745c4ea528b4b40509/Dockerfile#L37

Downgrading shadow from shadow-maint/shadow@42324e501768675993235e03f7e4569135802d18 to shadow-maint/shadow@bb3f810611ca254b07f0af4597b2735b6d1c4be1 works for me.

Seems regression in shadow-maint/shadow@52c081b02c4ca4432330ee336a60f6f803431e63 (https://github.com/shadow-maint/shadow/pull/138)

cc @brauner @hallyn

AkihiroSuda commented 5 years ago

A weird thing is that moby/buildkit:v0.3.2-rootless does not seem to hit the issue, even though it uses https://github.com/shadow-maint/shadow/commit/42324e501768675993235e03f7e4569135802d18 as well.

https://github.com/moby/buildkit/blob/fba893e789edf99105ca215e8ff6e8c46829daf2/hack/dockerfiles/test.buildkit.Dockerfile#L216

giuseppe commented 5 years ago

@AkihiroSuda thanks for tagging me here.

I think the issue was introduced by this line: https://github.com/shadow-maint/shadow/commit/52c081b02c4ca4432330ee336a60f6f803431e63#diff-8ac0a3e15c794bc194cfd1f9ba793717R170

When CAP_SYS_ADMIN is present, the seteuid is skipped but the capset is still performed, ending up having either CAP_SETUID or CAP_SETGID and euid = 0.

In the first version I've posted instead there was no cap drop when CAP_SYS_ADMIN is present, so that writing to the uid_map file is possible no matter what is the euid.

You have not noticed the issue as you are probably dropping CAP_SYS_ADMIN, but in the report here I see a --privileged container is used so CAP_SYS_ADMIN is present.

I've opened a PR here: https://github.com/shadow-maint/shadow/pull/141

Could you please confirm if it solves the issue for you as well?

I've tested it locally the matrix CAP_SYS_ADMIN/!CAP_SYS_ADMIN filecaps/setuid, both on Fedora 29 and Ubuntu 18.04

AkihiroSuda commented 5 years ago

@giuseppe Thanks, your patch works for me with the latest img. Although I'm still curious why BuildKit/RootlessKit did not hit the issue...

giuseppe commented 5 years ago

Although I'm still curious why BuildKit/RootlessKit did not hit the issue...

were you running the container without CAP_SYS_ADMIN?

AkihiroSuda commented 5 years ago

Sorry rootlesskit hits the issue as well, it looks like moby/buildkit:v0.3.2-rootless just uses different version of pre-built newuidmap binary, so moby/buildkit:v0.3.2-rootless is free from the issue. https://github.com/moby/buildkit/blob/fba893e789edf99105ca215e8ff6e8c46829daf2/hack/dockerfiles/test.buildkit.Dockerfile#L236

jeunii commented 4 years ago

I have a question regarding a similar issue. When you guys say

Ensured uidmap & libseccomp-dev packages have been installed. /proc/sys/kernel/unprivileged_userns_clone is also enabled.

Are we talking about the host OS ? In my case I am deploying my docker image on my MacOS.

Eventually I plan to run my builds in GKE. So is the assumption correct that my kubernetes node's OS should have the above settings ? So that the img container can run on them ?

AkihiroSuda commented 4 years ago

The host os.

GKE COS is broken but GKE Ubuntu works.

jeunii commented 4 years ago

@AkihiroSuda Thank you for the reply. Could you kindly elaborate a bit on the details why COS is incompatible with img. Im making a study on the feasibility of using img in our GKE managed k8s and all our build nodes use COS as the Image type.

Does it have to do with user namespace support enabled. Does COS based systems not support that ?

AkihiroSuda commented 4 years ago

COS has a kernel patch issue https://github.com/moby/buildkit/issues/879