genuinetools / img

Standalone, daemon-less, unprivileged Dockerfile and OCI compatible container image builder.
https://blog.jessfraz.com/post/building-container-images-securely-on-kubernetes/
MIT License
3.9k stars 231 forks source link

uid range not allowed #334

Open akosch1986 opened 3 years ago

akosch1986 commented 3 years ago

Hello!

I am working inside an OpenShift ENV and trying to use img.

I can explain my problem in short: When I try to build an image - I get the follwing error message:

/img-dev $ img -s .local build -t test . newuidmap: uid range [1-65537) -> [100000-165536) not allowed nsenter: failed to use newuidmap: Invalid argument nsenter: failed to sync with parent: SYNC_USERMAP_ACK: got 255: Invalid argument /img-dev $ echo $SYNC_USERMAP_ACK

Do you have an Idea what I could do to get img run inside my OpenShift-Env?

Thank you very much!

shoelzle commented 3 years ago

I have the same issue with img v0.5.11. In my case the error appears if ADD is used.

After some testing it seems the issue appears since 0.5.8, which has a lot of changes for a micro version increment: https://github.com/genuinetools/img/compare/v0.5.7...v0.5.8

lel-amri commented 3 years ago

I also have this issue. It's impeding me for a while now (It prevented me from using img on Debian for a long time (~5 months from now), until I managed to build img from scratch with go 1.13 (A few weeks ago)).

I corroborate @shoelzle: It breaks between v0.5.7 (https://github.com/genuinetools/img/commit/d14bb92b69804443263d647647b0833013b8df91) and v0.5.8 (https://github.com/genuinetools/img/commit/213cd7b10e96a811acbce1a015431b93f37448a9). With some troubleshooting I could narrow the introduction of the issue to commit https://github.com/genuinetools/img/commit/6cbe66bade1ed0a606166c98065709d3c21ee9fc. At this exact commit, if buildkit is rollbacked to 0.4.0 instead of 0.5.1 and containerd to f5b0fa220df8 instead of 3a3f0aac8819, the issue goes away.

I'm not fluent in Go at all and gdb struggle to debug Go code so I'm struggling a lot to find the root cause. From what I understand, the init function nsexec work as intended; The bug happens because img's init function nsexec is called although we're on an already unshared namespace (unshared properly by img), I guess it happens when buildkit starts a container. This behavior doesn't exists in v0.5.7.

Here is the least hacky prints-based debug that exhibits symptoms of the aforementioned behavior:

diff --git a/internal/unshare/unshare.c b/internal/unshare/unshare.c
index a5380477..1c82a035 100644
--- a/internal/unshare/unshare.c
+++ b/internal/unshare/unshare.c
@@ -296,23 +296,47 @@ static void set_propagation(unsigned long flags)
        bail("cannot change root filesystem propagation");
 }

+#define append_debug(format, ...) do { \
+       size_t len = snprintf(debug + debug_len, sizeof(debug) - debug_len, format __VA_OPT__(,) __VA_ARGS__); \
+       debug_len += len; \
+   } while(0)
+
+#define print_debug() do { \
+       printf("\n###\n%s###\n", debug); \
+       fflush(stdout); \
+   } while (0)
+
 void nsexec(void)
 {
+   char debug[4096];
+   size_t debug_len = 0;
    /*
     * Return early if we are just running the tests.
     */
    const char* running_tests = getenv("IMG_RUNNING_TESTS");
    if (running_tests){
+       append_debug("IMG_RUNNING_TESTS=%s\n", running_tests);
+       print_debug();
        return;
    }
+   else
+   {
+       append_debug("IMG_RUNNING_TESTS is unset\n");
+   }

    /*
     * Return early if we are not told to do the unshare.
     */
    const char* do_unshare = getenv("IMG_DO_UNSHARE");
    if (!do_unshare){
+       append_debug("IMG_DO_UNSHARE is unset\n");
+       print_debug();
        return;
    }
+   else
+   {
+       append_debug("IMG_DO_UNSHARE=%s\n", do_unshare);
+   }

    unsigned long propagation = UNSHARE_PROPAGATION_DEFAULT;
    jmp_buf env;
@@ -337,6 +361,9 @@ void nsexec(void)
    char *gid_map;
    gid_map = read_ranges(GID);

+   append_debug("euid: %u, egid: %u, uid_map %s, gid_map: %s\n", real_euid, real_egid, uid_map, gid_map);
+   print_debug();
+
    /*
     * Make the process non-dumpable, to avoid various race conditions that
     * could cause processes in namespaces we're joining to access host

Sample output for v0.5.11:

###
IMG_RUNNING_TESTS is unset
IMG_DO_UNSHARE is unset
###

###
IMG_RUNNING_TESTS is unset
IMG_DO_UNSHARE=1
euid: 1000, egid: 1000, uid_map 1000:1,100000:65536, gid_map: 1000:1,100000:65536
###
Building ***/alpine:3.12
Setting up the rootfs... this may take a bit.
time="2021-04-04T10:22:21+02:00" level=warning msg="using host network as the default"
#1 [internal] load .dockerignore
#1 transferring context: 2B done
#1 DONE 0.0s

#2 [internal] load build definition from Dockerfile
#2 transferring dockerfile: 94B done
#2 DONE 0.0s

#3 [internal] load build context
#3 transferring context: 42B done
#3 DONE 0.0s

#4 [1/1] ADD /base-rootfs.tar.xz /
#4 ERROR: Error processing tar file(exit status 16): 
###
IMG_RUNNING_TESTS is unset
IMG_DO_UNSHARE=1
euid: 0, egid: 0, uid_map 0:1,100000:65536, gid_map: 0:1,100000:65536
###
newuidmap: uid range [1-65537) -> [100000-165536) not allowed
nsenter: failed to use newuidmap: Success
nsenter: failed to sync with parent: SYNC_USERMAP_ACK: got 255: Success

------
 > [1/1] ADD /base-rootfs.tar.xz /:
------
Error: failed to solve: Error processing tar file(exit status 16): 
###
IMG_RUNNING_TESTS is unset
IMG_DO_UNSHARE=1
euid: 0, egid: 0, uid_map 0:1,100000:65536, gid_map: 0:1,100000:65536
###
newuidmap: uid range [1-65537) -> [100000-165536) not allowed
nsenter: failed to use newuidmap: Success
nsenter: failed to sync with parent: SYNC_USERMAP_ACK: got 255: Success

Sample output for v0.5.7:

###
IMG_RUNNING_TESTS is unset
IMG_DO_UNSHARE is unset
###

###
IMG_RUNNING_TESTS is unset
IMG_DO_UNSHARE=1
euid: 1000, egid: 1000, uid_map 1000:1,100000:65536, gid_map: 1000:1,100000:65536
###
Building ***/alpine:3.12
Setting up the rootfs... this may take a bit.

<DOES BUILDKIT STUFF (It's the same Dockerfile than for the v0.5.11 run)>

Successfully built ***/alpine:3.12

###
IMG_RUNNING_TESTS is unset
IMG_DO_UNSHARE is unset
###

###
IMG_RUNNING_TESTS is unset
IMG_DO_UNSHARE=1
euid: 1000, egid: 1000, uid_map 1000:1,100000:65536, gid_map: 1000:1,100000:65536
###
lel-amri commented 3 years ago

I confirm that v0.5.7 never execute /proc/self/exe past the first time it unshares namespaces. I believe something changed upstream, both in buildkit and docker, but I know little about these projects and it was very hard to wrap my head around the bug and understand the source-code (I'm also very new to golang).

Here is a GDB backtrace of img v0.5.11 in the main subprocess (The one being unshared properly, and doing most of the stuff), the part of (part of) the callstack leading to the execution of /proc/self/exe:

Thread 5.10 (LWP 21052 "runc:[1:CHILD]"):
#0  syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
#1  0x00007ffff7768c5f in syscall.rawClone (flags=<optimized out>, child_stack=<optimized out>, ptid=<optimized out>, ctid=<optimized out>, regs=<optimized out>) at ../../../gcc/libgo/go/syscall/clone_linux.c:100
#2  0x00007ffff76d4163 in syscall.forkAndExecInChild1 (argv0=argv0@entry=0xc00036e090 "/proc/self/exe", chroot=0x0, dir=0x0, attr=0x7fff7afd66a0, sys=0xc000348000, pipe=24, envv=..., argv=..., envv=..., argv=...) at ../../../gcc/libgo/go/syscall/exec_linux.go:219
#3  0x00007ffff76d53a7 in syscall.forkExec (argv0=..., argv=..., attr=0x7fff7afd66a0) at ../../../gcc/libgo/go/syscall/exec_unix.go:231
#4  0x00007ffff777bd7c in __morestack () at ../../../gcc/libgcc/config/i386/morestack.S:546
#5  0x00007ffff76d5abe in syscall.StartProcess (argv0=..., argv=..., attr=attr@entry=0x7fff7afd66a0) at ../../../gcc/libgo/go/syscall/exec_unix.go:302
#6  0x00007ffff75e243b in os.startProcess (attr=<optimized out>, argv=..., name=...) at ../../../gcc/libgo/go/os/exec_posix.go:53
#7  os.StartProcess (name=..., argv=..., attr=attr@entry=0x7fff7afd6a30) at ../../../gcc/libgo/go/os/exec.go:102
#8  0x00007ffff75e8c70 in os..z2fexec.Cmd.Start (param=param@entry=0xc000342000) at ../../../gcc/libgo/go/os/exec/exec.go:422
#9  0x000000000113c993 in chrootarchive.invokeUnpack (root=..., options=0xc0001de140, dest=..., decompressedArchive=...) at /home/leo/go/pkg/mod/github.com/docker/docker@v1.4.2-0.20200227233006-38f52c9fec82/pkg/chrootarchive/archive_unix.go:98
#10 github.x2ecom..z2fdocker..z2fdocker..z2fpkg..z2fchrootarchive.untarHandler (tarArchive=..., dest=..., options=0xc0001de140, options@entry=0x0, decompress=decompress@entry=true, root=...) at /home/leo/go/pkg/mod/github.com/docker/docker@v1.4.2-0.20200227233006-38f52c9fec82/pkg/chrootarchive/archive.go:97
#11 0x00000000010fa41b in github.x2ecom..z2fdocker..z2fdocker..z2fpkg..z2fchrootarchive.Untar (options=0x0, dest=..., tarArchive=...) at  /home/leo/go/pkg/mod/github.com/docker/docker@v1.4.2-0.20200227233006-38f52c9fec82/pkg/chrootarchive/archive.go:39
#12 file.unpack (srcRoot=..., src=..., destRoot=..., dest=..., ch=<optimized out>, tm=<optimized out>, ctx=...) at /home/leo/go/pkg/mod/github.com/moby/buildkit@v0.7.2/solver/llbsolver/file/unpack.go:38
#13 0x00000000010fbce6 in file.docopy (idmap=<optimized out>, u=0x0, action=..., dest=..., src=..., ctx=...) at /home/leo/go/pkg/mod/github.com/moby/buildkit@v0.7.2/solver/llbsolver/file/backend.go:226
#14 github.x2ecom..z2fmoby..z2fbuildkit..z2fsolver..z2fllbsolver..z2ffile.Backend.Copy (fb=fb@entry=0x7ffff7fc2f18 <runtime.zerobase>, ctx=..., m1=..., m2=..., user=..., group=..., action=...) at /home/leo/go/pkg/mod/github.com/moby/buildkit@v0.7.2/solver/llbsolver/file/backend.go:341
#15 0x0000000000f4c583 in ops.func1 (ctx=...) at /home/leo/go/pkg/mod/github.com/moby/buildkit@v0.7.2/solver/llbsolver/ops/file.go:558
#16 0x0000000000ebcb80 in github.x2ecom..z2fmoby..z2fbuildkit..z2futil..z2fflightcontrol.call.run (c=<optimized out>) at /home/leo/go/pkg/mod/github.com/moby/buildkit@v0.7.2/util/flightcontrol/flightcontrol.go:121
#17 0x0000000000ebcc9d in flightcontrol.github.x2ecom/moby/buildkit/util/flightcontrol..thunk0 () at /home/leo/go/pkg/mod/github.com/moby/buildkit@v0.7.2/util/flightcontrol/flightcontrol.go:117
#18 0x00007ffff76bfac7 in sync.Once.doSlow (o=<optimized out>, f=<optimized out>) at ../../../gcc/libgo/go/sync/once.go:66
#19 0x00007ffff777bd7c in __morestack () at ../../../gcc/libgcc/config/i386/morestack.S:546
#20 0x00007ffff76bfb45 in sync.Once.Do (o=<optimized out>, f=<optimized out>) at ../../../gcc/libgo/go/sync/once.go:57
#21 0x0000000000ebd1e1 in flightcontrol.github.x2ecom/moby/buildkit/util/flightcontrol..thunk5 (__go_thunk_parameter=<optimized out>) at /home/leo/go/pkg/mod/github.com/moby/buildkit@v0.7.2/util/flightcontrol/flightcontrol.go:148
#22 0x00007ffff7670aa8 in runtime.kickoff () at ../../../gcc/libgo/go/runtime/proc.go:1053
#23 0x0000000000000000 in ?? ()

From what I understand, the img way to unshare namespaces doesn't play nice with docker's reexec module. The latter, in an init function (one running past the main), reads argv[0] and kick-off an entirely different procedure for some well-known strings.

I pushed a dirty fix at https://github.com/lel-amri/img/commit/b409d1eda10aa0464f4b28c4356da0687d9ef4d6 (Temporary branch fix-334) ? @akosch1986, @shoelzle, can you try that ?

lel-amri commented 3 years ago

@jessfraz Is there a need for changing user namespaces on init functions ? I think I found a clean way to fix this, as long as I unshare past the main.

GitNico commented 2 years ago

I pushed a dirty fix at lel-amri@b409d1e (Temporary branch fix-334) ? @akosch1986, @shoelzle, can you try that ?

Hi @lel-amri , In regards of shoelzle, we work together, I can confirm that your "dirty fix" works. I've cloned your branch fix-334 and staticaly compiled the img version. After this, I have copied this compiled img version in one of the r.j3ss.co/img container. Than I have run a simple Dockerfile:

Without modification

[:~/docker-test] 127 $ docker run -it r.j3ss.co/img:v0.5.11 -v
img version v0.5.11, build 5b908689
[gal:~/docker-test] 11s $ cat Dockerfile 
FROM scratch
ADD asdf.tar.gz /

[:~/docker-test] $ docker run --rm --volume $(pwd):/home/user/src --security-opt seccomp=unconfined --security-opt apparmor=unconfined --workdir /home/user/src --entrypoint /bin/sh r.j3ss.co/img:v0.5.11 -c 'img build --tag test_img0.5.11 --file Dockerfile .'
Building docker.io/library/test_img0.5.11:latest
Setting up the rootfs... this may take a bit.
time="2021-11-08T13:02:38Z" level=warning msg="Process sandbox is not available, consider unmasking procfs: mount: permission denied (are you root?)\n"
time="2021-11-08T13:02:38Z" level=warning msg="using host network as the default"
#1 [internal] load build definition from Dockerfile
#1 transferring dockerfile: 69B done
#1 DONE 0.0s

#2 [internal] load .dockerignore
#2 transferring context: 2B done
#2 DONE 0.0s

#3 [internal] load build context
#3 transferring context: 309B done
#3 DONE 0.0s

#4 [1/1] ADD asdf.tar.gz /
#4 ERROR: Error processing tar file(exit status 16): newuidmap: uid range [1-65537) -> [100000-165536) not allowed
nsenter: failed to use newuidmap: Invalid argument
nsenter: failed to sync with parent: SYNC_USERMAP_ACK: got 255: Invalid argument

------
 > [1/1] ADD asdf.tar.gz /:
------
Error: failed to solve: Error processing tar file(exit status 16): newuidmap: uid range [1-65537) -> [100000-165536) not allowed
nsenter: failed to use newuidmap: Invalid argument
nsenter: failed to sync with parent: SYNC_USERMAP_ACK: got 255: Invalid argument

[:~] $ ls -l img 
-rwxr-xr-x 1 root root 29328192 Sep 24 14:09 img
[:~] $ file img 
img: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, Go BuildID=tHN9l8CAPP_wDhHftWo5/1yh1Fp7X3dfWmJ61bLpj/rHIsDnj8Tk-GIhz7n_-R/re1oh-ZIG7srMrRSzBOa, not stripped

With modification:

[:~/docker-test] 1 $ docker run -it img_test:latest
/ $ img -v
img version v0.5.11, build b409d1ed

$ [:~/docker-test] 46s 12docker run --rm --volume $(pwd):/home/user/src --security-opt seccomp=unconfined --security-opt apparmor=unconfined --workdir /home/user/src --entrypoint /bin/sh img_test:latest -c 'img build --tag test_img0.5.11 --file Dockerfile .'
Building docker.io/library/test_img0.5.11:latest
Setting up the rootfs... this may take a bit.
time="2021-11-08T13:05:24Z" level=warning msg="Process sandbox is not available, consider unmasking procfs: mount: permission denied (are you root?)\n"
time="2021-11-08T13:05:24Z" level=warning msg="using host network as the default"
#2 [internal] load .dockerignore
#2 transferring context: 2B done
#2 DONE 0.0s

#1 [internal] load build definition from Dockerfile
#1 transferring dockerfile: 69B done
#1 DONE 0.0s

#4 [internal] load build context
#4 transferring context: 309B done
#4 DONE 0.0s

#3 [1/1] ADD asdf.tar.gz /
#3 DONE 10.0s

#5 exporting to image
#5 exporting layers done
#5 exporting manifest sha256:7cc00282896e2dffc6cd77e6579336d321968e52436f8455ba0ec9eba1770c81 done
#5 exporting config sha256:13b10cbb75f0b960e35abd34e9ff2a80450dbf7576445d5a4fdd6970c080feb8 done
#5 naming to docker.io/library/test_img0.5.11:latest done
#5 DONE 0.0s

#6 exporting cache
#6 preparing build cache for export done
#6 DONE 0.0s
Successfully built docker.io/library/test_img0.5.11:latest

[:~/git/lel-amri-img] fix-334 ± file img
img: ELF 64-bit LSB executable, x86-64, version 1 (GNU/Linux), statically linked, BuildID[sha1]=23da5c664bda4c02620643de98a7ddfb314115e5, for GNU/Linux 3.2.0, not stripped
lel-amri commented 2 years ago

Thanks for testing @GitNico. I'll try to come up with something cleaner, then submit a PR. In the meantime I'll try to maintain a downstream fix on my clone. I'll keep you informed.

lel-amri commented 2 years ago

I found the breaking change between v0.5.7 and v0.5.8: buildkit changed the way it extracts archive files:

  1. https://github.com/moby/buildkit/compare/v0.4.0...v0.5.1#diff-905097ba6a4e36b3afbb1204e52315791dc0915f2a845c768bc84c6af5db1e1fR477
  2. https://github.com/moby/buildkit/compare/v0.4.0...v0.5.1#diff-905097ba6a4e36b3afbb1204e52315791dc0915f2a845c768bc84c6af5db1e1fR772
  3. https://github.com/moby/buildkit/compare/v0.4.0...v0.5.1#diff-905097ba6a4e36b3afbb1204e52315791dc0915f2a845c768bc84c6af5db1e1fR761

It "now" ("then", actually) ultimately uses the function untar at https://github.com/moby/moby/blob/e7b5f7dbe98c559b20c0c8c20c0b31a6b197d717/pkg/chrootarchive/archive_unix.go#L22.

That confirm that the only way to circumvent the issue is to properly make use of the Docker's reexec module.

I tried to come up with a small fix, but the mix of C and Go for process initialization doesn't play nicely. I can either get the namespace unshare or the Docker reexec, but not both. It nearly always ends up in a deadlock during Go runtime initialization. The only way I can think of to fix this issue is to re-implement namespaces unsharing in img. @AkihiroSuda mentionned Rootlesskit in this comment and I could re-use the unshare part of Rootlesskit in img. That would be a clean and definitive fix. Meanwhile, my former dirty-fix still applies.

lel-amri commented 2 years ago

Hey guys, sorry for the delay, I had a lot to do at work. You can find the fix at https://github.com/lel-amri/img/commit/f0979f292a08f204dc9604afe81a393a641b8f94. I'll try to keep the issue-334 branch up-to-date with master, I believe it won't be that hard given the slow updates on this project.

For auditing sake, here is the comparison between master and my fix: https://github.com/genuinetools/img/compare/master...lel-amri:issue-334