containers / buildah

A tool that facilitates building OCI images.
https://buildah.io
Apache License 2.0
7.24k stars 766 forks source link

buildah fails to pull openjdk:15 in container with permission denied #3384

Closed Peter-Sh closed 2 years ago

Peter-Sh commented 3 years ago

Description

Buildah fails with permission denied when pulling docker.io/openjdk:15 if running in container (podman or docker)

This is may be related to #1709, because strace also shows EPERM on mknodeat syscall, but it happens with vfs and fuse-overlayfs.

Steps to reproduce the issue:

echo 'FROM docker.io/openjdk:15' > Dockerfile
podman run --device /dev/fuse -v `pwd`:/test -w /test quay.io/buildah/stable  buildah --log-level debug --storage-driver=overlay  --storage-opt=overlay.mount_program=/usr/bin/fuse-overlayfs bud .

The same happens with docker (to run buidah with docker custom seccomp should be provided) and with --storage-driver=vfs

Workaround is to add --privileged

Environment

Tested buildah versions:

Podman version:

Docker versions:

Operating systems:

Describe the results you received:

Program output from last Ubuntu 20.04 run:

root@buildah-test:/home/ubuntu# podman run --device /dev/fuse -v `pwd`:/test -w /test quay.io/buildah/stable  buildah --log-level debug --storage-driver=overlay  --storage-opt=overlay.mount_program=/usr/bin/fuse-overlayfs bud .
time="2021-07-16T07:58:27Z" level=debug msg="running [buildah-in-a-user-namespace --log-level debug --storage-driver=overlay --storage-opt=overlay.mount_program=/usr/bin/fuse-overlayfs bud .] with environment [PATH=/usr/local/sbin:/usr/
local/bin:/usr/sbin:/usr/bin:/sbin:/bin TERM=xterm container=oci BUILDAH_ISOLATION=chroot DISTTAG=f34container FGC=f34 HOME=/root HOSTNAME=9098b3c301fc TMPDIR=/var/tmp _CONTAINERS_USERNS_CONFIGURED=1], UID map [{ContainerID:0 HostID:0 S
ize:4294967295}], and GID map [{ContainerID:0 HostID:0 Size:4294967295}]"
time="2021-07-16T07:58:27Z" level=debug msg="Pull Policy for pull [ifnewer]"
time="2021-07-16T07:58:27Z" level=debug msg="[graphdriver] trying provided driver \"overlay\""
time="2021-07-16T07:58:27Z" level=debug msg="overlay: mount_program=/usr/bin/fuse-overlayfs"
time="2021-07-16T07:58:27Z" level=debug msg="backingFs=extfs, projectQuotaSupported=false, useNativeDiff=false, usingMetacopy=false"
time="2021-07-16T07:58:27Z" level=debug msg="base: \"docker.io/openjdk:15\""
STEP 1: FROM docker.io/openjdk:15
time="2021-07-16T07:58:27Z" level=debug msg="FROM \"docker.io/openjdk:15\""
time="2021-07-16T07:58:27Z" level=debug msg="Pulling image docker.io/openjdk:15 (policy: newer)"
time="2021-07-16T07:58:27Z" level=debug msg="Looking up image \"docker.io/openjdk:15\" in local containers storage"
time="2021-07-16T07:58:27Z" level=debug msg="Trying \"docker.io/openjdk:15\" ..."
time="2021-07-16T07:58:27Z" level=debug msg="Trying \"docker.io/library/openjdk:15\" ..."
time="2021-07-16T07:58:27Z" level=debug msg="Trying \"docker.io/library/openjdk:15\" ..."
time="2021-07-16T07:58:27Z" level=debug msg="Loading registries configuration \"/etc/containers/registries.conf\""
time="2021-07-16T07:58:27Z" level=debug msg="Loading registries configuration \"/etc/containers/registries.conf.d/000-shortnames.conf\""
time="2021-07-16T07:58:27Z" level=debug msg="Attempting to pull candidate docker.io/library/openjdk:15 for docker.io/openjdk:15"
time="2021-07-16T07:58:27Z" level=debug msg="parsed reference into \"[overlay@/var/lib/containers/storage+/run/containers/storage:overlay.mount_program=/usr/bin/fuse-overlayfs]docker.io/library/openjdk:15\""
Trying to pull docker.io/library/openjdk:15...
time="2021-07-16T07:58:27Z" level=debug msg="Copying source image //openjdk:15 to destination image [overlay@/var/lib/containers/storage+/run/containers/storage:overlay.mount_program=/usr/bin/fuse-overlayfs]docker.io/library/openjdk:15"
time="2021-07-16T07:58:27Z" level=debug msg="Trying to access \"docker.io/library/openjdk:15\""
time="2021-07-16T07:58:27Z" level=debug msg="Trying to access \"docker.io/library/openjdk:15\""
time="2021-07-16T07:58:27Z" level=debug msg="No credentials for docker.io found"
time="2021-07-16T07:58:27Z" level=debug msg="Using registries.d directory /etc/containers/registries.d for sigstore configuration"
time="2021-07-16T07:58:27Z" level=debug msg=" Using \"default-docker\" configuration"
time="2021-07-16T07:58:27Z" level=debug msg=" No signature storage configuration found for docker.io/library/openjdk:15, using built-in default file:///var/lib/containers/sigstore"
time="2021-07-16T07:58:27Z" level=debug msg="Looking for TLS certificates and private keys in /etc/docker/certs.d/docker.io"
time="2021-07-16T07:58:27Z" level=debug msg="GET https://registry-1.docker.io/v2/"
time="2021-07-16T07:58:27Z" level=debug msg="Ping https://registry-1.docker.io/v2/ status 401"
time="2021-07-16T07:58:27Z" level=debug msg="GET https://auth.docker.io/token?scope=repository%3Alibrary%2Fopenjdk%3Apull&service=registry.docker.io"
time="2021-07-16T07:58:28Z" level=debug msg="GET https://registry-1.docker.io/v2/library/openjdk/manifests/15"
time="2021-07-16T07:58:29Z" level=debug msg="Content-Type from manifest GET is \"application/vnd.docker.distribution.manifest.list.v2+json\""
time="2021-07-16T07:58:29Z" level=debug msg="Using blob info cache at /var/lib/containers/storage/cache/blob-info-cache-v1.boltdb"
time="2021-07-16T07:58:29Z" level=debug msg="Source is a manifest list; copying (only) instance sha256:0f5476c194b5cd13019bf11faec6836de2bdefa9c8a6fba818dd94e321a7d1c2 for current system"
time="2021-07-16T07:58:29Z" level=debug msg="GET https://registry-1.docker.io/v2/library/openjdk/manifests/sha256:0f5476c194b5cd13019bf11faec6836de2bdefa9c8a6fba818dd94e321a7d1c2"
time="2021-07-16T07:58:30Z" level=debug msg="Content-Type from manifest GET is \"application/vnd.docker.distribution.manifest.v2+json\""
time="2021-07-16T07:58:30Z" level=debug msg="IsRunningImageAllowed for image docker:docker.io/library/openjdk:15"
time="2021-07-16T07:58:30Z" level=debug msg=" Using default policy section"
time="2021-07-16T07:58:30Z" level=debug msg=" Requirement 0: allowed"
time="2021-07-16T07:58:30Z" level=debug msg="Overall: allowed"
time="2021-07-16T07:58:30Z" level=debug msg="Downloading /v2/library/openjdk/blobs/sha256:bae9931e822b1762f91550ecbefea67b9421c249125e305954e3f10ac78a4632"
time="2021-07-16T07:58:30Z" level=debug msg="GET https://registry-1.docker.io/v2/library/openjdk/blobs/sha256:bae9931e822b1762f91550ecbefea67b9421c249125e305954e3f10ac78a4632"
Getting image source signatures
time="2021-07-16T07:58:31Z" level=debug msg="Reading /var/lib/containers/sigstore/library/openjdk@sha256=0f5476c194b5cd13019bf11faec6836de2bdefa9c8a6fba818dd94e321a7d1c2/signature-1"
time="2021-07-16T07:58:31Z" level=debug msg="Manifest has MIME type application/vnd.docker.distribution.manifest.v2+json, ordered candidate list [application/vnd.docker.distribution.manifest.v2+json, application/vnd.docker.distribution.
manifest.v1+prettyjws, application/vnd.oci.image.manifest.v1+json, application/vnd.docker.distribution.manifest.v1+json]"
time="2021-07-16T07:58:31Z" level=debug msg="... will first try using the original manifest unmodified"
time="2021-07-16T07:58:31Z" level=debug msg="Downloading /v2/library/openjdk/blobs/sha256:ab2540feecc546a000ab358bde2118583f78b6074e7141f2b75c2f804970d429"
time="2021-07-16T07:58:31Z" level=debug msg="GET https://registry-1.docker.io/v2/library/openjdk/blobs/sha256:ab2540feecc546a000ab358bde2118583f78b6074e7141f2b75c2f804970d429"
time="2021-07-16T07:58:31Z" level=debug msg="Downloading /v2/library/openjdk/blobs/sha256:9509c6b41a37fbf5dbb93aedded1aff0dc6ed45ab2d334440e10a5c8d112531c"
time="2021-07-16T07:58:31Z" level=debug msg="GET https://registry-1.docker.io/v2/library/openjdk/blobs/sha256:9509c6b41a37fbf5dbb93aedded1aff0dc6ed45ab2d334440e10a5c8d112531c"
time="2021-07-16T07:58:31Z" level=debug msg="Downloading /v2/library/openjdk/blobs/sha256:1a0005db77786a07e4e5d56adc224c9dc85320b46354f9110eb174ce7df9df04"
time="2021-07-16T07:58:31Z" level=debug msg="GET https://registry-1.docker.io/v2/library/openjdk/blobs/sha256:1a0005db77786a07e4e5d56adc224c9dc85320b46354f9110eb174ce7df9df04"
Copying blob sha256:ab2540feecc546a000ab358bde2118583f78b6074e7141f2b75c2f804970d429
time="2021-07-16T07:58:31Z" level=debug msg="Detected compression format gzip"
time="2021-07-16T07:58:31Z" level=debug msg="Using original blob without modification"
Copying blob sha256:1a0005db77786a07e4e5d56adc224c9dc85320b46354f9110eb174ce7df9df04
time="2021-07-16T07:58:31Z" level=debug msg="Detected compression format gzip"
time="2021-07-16T07:58:31Z" level=debug msg="Using original blob without modification"
Copying blob sha256:9509c6b41a37fbf5dbb93aedded1aff0dc6ed45ab2d334440e10a5c8d112531c
time="2021-07-16T07:58:31Z" level=debug msg="Detected compression format gzip"
time="2021-07-16T07:58:31Z" level=debug msg="Using original blob without modification"
Copying config sha256:bae9931e822b1762f91550ecbefea67b9421c249125e305954e3f10ac78a4632
time="2021-07-16T07:58:35Z" level=debug msg="No compression detected"
time="2021-07-16T07:58:35Z" level=debug msg="Using original blob without modification"
Writing manifest to image destination
Storing signatures
time="2021-07-16T07:58:35Z" level=debug msg="Applying tar in /var/lib/containers/storage/overlay/acf86001822d28ddf15fadf6efe420b6f9caabcfd04f622c5fef2a7f454cdc30/diff"
time="2021-07-16T07:58:36Z" level=debug msg="Error pulling candidate docker.io/library/openjdk:15: Error committing the finished image: error adding layer with blob \"sha256:9509c6b41a37fbf5dbb93aedded1aff0dc6ed45ab2d334440e10a5c8d11253
1c\": Error processing tar file(exit status 1): operation not permitted"
error creating build container: Error committing the finished image: error adding layer with blob "sha256:9509c6b41a37fbf5dbb93aedded1aff0dc6ed45ab2d334440e10a5c8d112531c": Error processing tar file(exit status 1): operation not permitted
time="2021-07-16T07:58:36Z" level=debug msg="shutting down the store"
time="2021-07-16T07:58:36Z" level=error msg="exit status 125"

Describe the results you expected:

Successful build.

rhatdan commented 3 years ago

We disable CAP_MKNOD by default if you are using Podman or Kubernetes, so you have to add the cap

--cap-add mknod

Peter-Sh commented 3 years ago

Sorry for the noise @rhatdan , may be I'm missing something important, but adding --cap-add mknod doesn't help, buildah still fails with permission denied.

The command is almost the same (podman is running as root)

echo 'FROM docker.io/openjdk:15' > Dockerfile
podman run --cap-add mknod --device /dev/fuse -it -v `pwd`:/test -w /test quay.io/buildah/stable buildah bud  .

Strace output from the process which gets EPERM error. Process is trying to create /dev/console and fails.

....
newfstatat(AT_FDCWD, "/boot", {st_mode=S_IFDIR|0555, st_size=4096, ...}, AT_SYMLINK_NOFOLLOW) = 0
fchmodat(AT_FDCWD, "/boot", 0555)       = 0
utimensat(AT_FDCWD, "/boot", [{tv_sec=1596240052, tv_nsec=0} /* 2020-08-01T00:00:52+0000 */, {tv_sec=1596240052, tv_nsec=0} /* 2020-08-01T00:00:52+0000 */], 0) = 0
read(0, "dev/\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512
newfstatat(AT_FDCWD, "/", {st_mode=S_IFDIR|0555, st_size=4096, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(AT_FDCWD, "/dev", 0xc0001c9078, AT_SYMLINK_NOFOLLOW) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/dev", 0xc0001c9148, AT_SYMLINK_NOFOLLOW) = -1 ENOENT (No such file or directory)
mkdirat(AT_FDCWD, "/dev", 0755)         = 0
newfstatat(AT_FDCWD, "/dev", {st_mode=S_IFDIR|0755, st_size=4096, ...}, AT_SYMLINK_NOFOLLOW) = 0
futex(0x55f39f43fe30, FUTEX_WAIT_PRIVATE, 0, NULL) = -1 EAGAIN (Resource temporarily unavailable)
fchmodat(AT_FDCWD, "/dev", 0755)        = 0
utimensat(AT_FDCWD, "/dev", [{tv_sec=1596240052, tv_nsec=0} /* 2020-08-01T00:00:52+0000 */, {tv_sec=1596240052, tv_nsec=0} /* 2020-08-01T00:00:52+0000 */], 0) = 0
futex(0x55f39f43fe30, FUTEX_WAIT_PRIVATE, 0, NULL) = 0
read(0, "dev/console\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512
newfstatat(AT_FDCWD, "/dev", {st_mode=S_IFDIR|0755, st_size=4096, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(AT_FDCWD, "/dev/console", 0xc0000341d8, AT_SYMLINK_NOFOLLOW) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/proc/self/uid_map", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
mknodat(AT_FDCWD, "/dev/console", S_IFCHR|0600, makedev(0x5, 0x1)) = -1 EPERM (Operation not permitted)
write(2, "operation not permitted", 23) = 23
exit_group(1)                           = ?
+++ exited with 1 +++
rhatdan commented 3 years ago

Are you trying to do this rootless?

rhatdan commented 3 years ago

Could you check if SELinux or SECCOMP is complaining?

sudo ausearch -m seccomp -ts recent sudo ausearch -m avc -ts recent

Peter-Sh commented 3 years ago

Are you trying to do this rootless?

I think no, because I'm running podman as root user and docker is rootfull by default.

Above output and strace was from ubuntu 20.04 system, apparmor is completely disabled in kernel options.

root@buildah-test:/home/ubuntu# aa-status
apparmor module is loaded.
apparmor filesystem is not mounted.
root@buildah-test:/home/ubuntu# uname -a
Linux buildah-test 5.4.0-54-generic #60-Ubuntu SMP Fri Nov 6 10:37:59 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
root@buildah-test:/home/ubuntu# podman --version
podman version 3.2.2

I tried to update Centos 8.4 (on centos I primarily tested this issue with docker) and podman with the same result as before and as in docker.

[root@centos-host buildah]# podman run --cap-add mknod -it -v `pwd`:/test -w /test quay.io/buildah/stable buildah bud  .
STEP 1: FROM docker.io/openjdk:15
Trying to pull docker.io/library/openjdk:15...
Getting image source signatures
Copying blob 9509c6b41a37 done
Copying blob ab2540feecc5 done
Copying blob 1a0005db7778 done
Copying config bae9931e82 done
Writing manifest to image destination
Storing signatures
error creating build container: Error committing the finished image: error adding layer with blob "sha256:9509c6b41a37fbf5dbb93aedded1aff0dc6ed45ab2d334440e10a5c8d112531c": Error processing tar file(exit status 1): operation not permitted
ERRO[0027] exit status 125

SELinux is complaining about reading Dockerfile (because it is mouted from host and is in root folder), but selinux is in Permissive mode and is not complaining anything about mknodeat syscall.

[root@centos-host buildah]# sudo ausearch -m seccomp -ts recent
<no matches>
[root@centos-host buildah]# sudo ausearch -m avc -ts recent
----
time->Tue Jul 20 20:43:28 2021
type=PROCTITLE msg=audit(1626803008.863:142): proctitle=6275696C6461682D696E2D612D757365722D6E616D65737061636500627564002E
type=SYSCALL msg=audit(1626803008.863:142): arch=c000003e syscall=257 success=yes exit=3 a0=ffffffffffffff9c a1=c000345a58 a2=80000 a3=0 items=0 ppid=6142 pid=6154 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts0
ses=1 comm="4" exe=2F6D656D66643A6275696C6461682D696E2D612D757365722D6E616D657370616365202864656C6574656429 subj=system_u:system_r:container_t:s0:c614,c775 key=(null)
type=AVC msg=audit(1626803008.863:142): avc:  denied  { open } for  pid=6154 comm="4" path="/test/Dockerfile" dev="dm-0" ino=35798605 scontext=system_u:system_r:container_t:s0:c614,c775 tcontext=unconfined_u:object_r:admin_home_t:s0 tcl
ass=file permissive=1
----
time->Tue Jul 20 20:44:03 2021
type=PROCTITLE msg=audit(1626803043.520:161): proctitle=6275696C6461682D696E2D612D757365722D6E616D65737061636500627564002E
type=SYSCALL msg=audit(1626803043.520:161): arch=c000003e syscall=257 success=yes exit=3 a0=ffffffffffffff9c a1=c000345a58 a2=80000 a3=0 items=0 ppid=6317 pid=6330 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts0
ses=1 comm="4" exe=2F6D656D66643A6275696C6461682D696E2D612D757365722D6E616D657370616365202864656C6574656429 subj=system_u:system_r:container_t:s0:c275,c923 key=(null)
type=AVC msg=audit(1626803043.520:161): avc:  denied  { open } for  pid=6330 comm="4" path="/test/Dockerfile" dev="dm-0" ino=51413458 scontext=system_u:system_r:container_t:s0:c275,c923 tcontext=unconfined_u:object_r:admin_home_t:s0 tcl
ass=file permissive=1
[root@centos-host buildah]# getenforce
Permissive

Other things:

[root@centos-host buildah]# podman --version
podman version 3.0.2-dev
[root@centos-host buildah]# cat /etc/centos-release
CentOS Linux release 8.4.2105
[root@centos-host buildah]# uname -a
Linux centos-host 4.18.0-305.7.1.el8_4.x86_64 #1 SMP Tue Jun 29 21:55:12 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
[root@centos-host buildah]#
Peter-Sh commented 3 years ago

Strace in centos shows the same EPERMed syscall

mknodat(AT_FDCWD, "/dev/console", S_IFCHR|0600, makedev(0x5, 0x1)) = -1 EPERM 

I noticed that there is buildah-in-a-user-namespace bud . process on the host when buildah is running in container. I think when buildah starts it calls unshare and then buildah-in-a-user-namespace, thus is running in user namespace. May be I should start buildah in some other way?

Peter-Sh commented 3 years ago

To give some context.

We are using buildah in docker as primary image building tool in gitlab-runner on centos 8 host. Because buildah does not compromise host machine by requiring docker.socket to be mounted from host or privileged container to run in. And were successfully used this setup for months to build many different images until the problem with openjdk:15 has arised. Then I've tried to boil it down to simple case and also to test with podman and on different machines and with different storage drivers (originally vfs was used).

If you need more information I'll be glad to provide it, I've created two machines (ubuntu 20.04 and centos 8) dedicated for this case.

Peter-Sh commented 3 years ago

buildah info tells rootless is true, may be I'm really missing something fundamental about running rootless or rootfull containers

I have especially published port 80 which should only be possible with rootfull containers, If I correctly understand.

root@buildah-test:/home/ubuntu# podman run -p 80:80 --cap-add mknod -it -v `pwd`:/test -w /test quay.io/buildah/stable buildah info
{
    "host": {
        "CgroupVersion": "v1",
        "Distribution": {
            "distribution": "fedora",
            "version": "34"
        },
        "MemFree": 100945920,
        "MemTotal": 2084048896,
        "OCIRuntime": "crun",
        "SwapFree": 0,
        "SwapTotal": 0,
        "arch": "amd64",
        "cpus": 2,
        "hostname": "bd5c6359102e",
        "kernel": "5.4.0-54-generic",
        "os": "linux",
        "rootless": true,
        "uptime": "122h 35m 7.93s (Approximately 5.08 days)"
    },
    "store": {
        "ContainerStore": {
            "number": 0
        },
        "GraphDriverName": "overlay",
        "GraphOptions": [
            "overlay.imagestore=/var/lib/shared",
            "overlay.mount_program=/usr/bin/fuse-overlayfs",
            "overlay.mountopt=nodev,fsync=0"
        ],
        "GraphRoot": "/var/lib/containers/storage",
        "GraphStatus": {
            "Backing Filesystem": "extfs",
            "Native Overlay Diff": "false",
            "Supports d_type": "true",
            "Using metacopy": "false"
        },
        "ImageStore": {
            "number": 0
        },
        "RunRoot": "/run/containers/storage"
    }
}
rhatdan commented 3 years ago

Can you do buildah --cap-add mknod inside of the container.

Peter-Sh commented 3 years ago

I have tried this yesterday on both ubuntu and centos with same result.

[root@centos-host buildah]# echo 'FROM docker.io/openjdk:15' > Dockerfile
[root@centos-host buildah]# podman  run --cap-add mknod -it -v `pwd`:/test -w /test quay.io/buildah/stable buildah --cap-add mknod bud .
STEP 1: FROM docker.io/openjdk:15
Trying to pull docker.io/library/openjdk:15...
Getting image source signatures
Copying blob ab2540feecc5 done  
Copying blob 9509c6b41a37 done  
Copying blob 1a0005db7778 done  
Copying config bae9931e82 done  
Writing manifest to image destination
Storing signatures
error creating build container: Error committing the finished image: error adding layer with blob "sha256:9509c6b41a37fbf5dbb93aedded1aff0dc6ed45ab2d334440e10a5c8d112531c": Error processing tar file(exit status 1): operation not permitted
ERRO[0030] exit status 125
Peter-Sh commented 3 years ago

On freshly installed Fedora Server 34 in Virtual box I observe same behavior.

It is very easily reproduced, just 3 commands as root (after installing podman)

setenforce 0
echo 'FROM docker.io/openjdk:15' > Dockerfile
podman  run --cap-add mknod -it -v `pwd`:/test -w /test quay.io/buildah/stable buildah --cap-add mknod bud .
Peter-Sh commented 3 years ago

Hi @rhatdan, would you mind reopening this issue if you don't think this is desired buildah behavior?

github-actions[bot] commented 2 years ago

A friendly reminder that this issue had no activity for 30 days.

flouthoc commented 2 years ago

I don't see this issue with on fresh podman and buildah both with rootless and rootfull, remember to use Z while mounting rootless to Selinux label is set on mounted volume.

Reproducer which i tried

echo 'FROM docker.io/openjdk:15' > Dockerfile
podman run --rm --device /dev/fuse -v /tmp/exp:/test:Z -w /test quay.io/buildah/stable  buildah --storage-driver=overlay  --storage-opt=overlay.mount_program=/usr/bin/fuse-overlayfs build .

Output

STEP 1/1: FROM docker.io/openjdk:15
Trying to pull docker.io/library/openjdk:15...
Getting image source signatures
Copying blob sha256:ab2540feecc546a000ab358bde2118583f78b6074e7141f2b75c2f804970d429
Copying blob sha256:9509c6b41a37fbf5dbb93aedded1aff0dc6ed45ab2d334440e10a5c8d112531c
Copying blob sha256:1a0005db77786a07e4e5d56adc224c9dc85320b46354f9110eb174ce7df9df04
Copying blob sha256:ab2540feecc546a000ab358bde2118583f78b6074e7141f2b75c2f804970d429
Copying blob sha256:9509c6b41a37fbf5dbb93aedded1aff0dc6ed45ab2d334440e10a5c8d112531c
Copying blob sha256:1a0005db77786a07e4e5d56adc224c9dc85320b46354f9110eb174ce7df9df04
Copying config sha256:bae9931e822b1762f91550ecbefea67b9421c249125e305954e3f10ac78a4632
Writing manifest to image destination
Storing signatures
COMMIT
--> bae9931e822
bae9931e822b1762f91550ecbefea67b9421c249125e305954e3f10ac78a4632

I'm closing this issue but feel free to reopen if i'm mistaken.

an-toine commented 2 years ago

Hello,

I'm facing an issue similar to the one described by @Peter-Sh but with official image mysql:5.7 this time.

Buildah outputs the following error :

sh-4.4# buildah pull registry.hub.docker.com/library/mysql:5.7
Trying to pull registry.hub.docker.com/library/mysql:5.7...
Getting image source signatures
Copying blob 2ff5c3b24fd5 done
Copying blob ef4ccd63cdb4 done
Copying blob 66fb34780033 done
Copying blob d6f28a94c51f done
Copying blob 71dd5852ecd9 done
Copying blob 7feea2a503b5 done
Copying blob 88a546386a61 done
Copying blob 65b18297cf83 done
Copying blob d64f23335fb8 done
Copying blob 6ba4171261fa done
Copying blob 96dcc6c8de93 done
writing blob: adding layer with blob "sha256:66fb3478003364657ac7751c40e41a135e02975f9459f652b1df23e3896b5fac": Error processing tar file(exit status 1): operation not permitted

I've straced the Buildah process, the error is the same as the excerpt posted above :

[pid   292] utimensat(AT_FDCWD, "/dev", [{tv_sec=1523415547, tv_nsec=0} /* 2018-04-11T02:59:07+0000 */, {tv_sec=1523415547, tv_nsec=0} /* 2018-04-11T02:59:07+0000 */], 0) = 0
[pid   292] read(0, "dev/console\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512
[pid   292] newfstatat(AT_FDCWD, "/dev", {st_mode=S_IFDIR|0755, st_size=6, ...}, AT_SYMLINK_NOFOLLOW) = 0
[pid   292] newfstatat(AT_FDCWD, "/dev/console", 0xc000109628, AT_SYMLINK_NOFOLLOW) = -1 ENOENT (No such file or directory)
[pid   292] mknodat(AT_FDCWD, "/dev/console", S_IFCHR|0600, makedev(0x5, 0x1)) = -1 EPERM (Operation not permitted)
[pid   292] write(2, "operation not permitted", 23) = 23
[pid   292] exit_group(1 <unfinished ...>
[pid   292] <... exit_group resumed>)   = ?
[pid   292] +++ exited with 1 +++

I've attached the full strace dump for convenience.

The same error is also raised for image openjdk:15 :

sh-4.4# buildah pull docker.io/openjdk:15
Trying to pull docker.io/library/openjdk:15...
Getting image source signatures
Copying blob ab2540feecc5 done
Copying blob 9509c6b41a37 done
Copying blob 1a0005db7778 done
writing blob: adding layer with blob "sha256:9509c6b41a37fbf5dbb93aedded1aff0dc6ed45ab2d334440e10a5c8d112531c": Error processing tar file(exit status 1): operation not permitted

We are using Buildah in a CI/CD setup : build pods are instantiated in an Openshift 4.10 cluster running with the anyuid scc.

As Buildah is run in a container, we have implemented the configuration described in this comment : https://github.com/openshift/enhancements/issues/362#issuecomment-1040664446.

The /dev/fuse socket is injected in build pods, and we use fuse-overlayfs as the helper mount program :

sh-4.4# cat /etc/containers/storage.conf | grep -vE '^#|^$'
[storage]
driver = "overlay"
runroot = "/run/containers/storage"
graphroot = "/var/lib/containers/storage"
[storage.options]
additionalimagestores = [
]
[storage.options.overlay]
mount_program = "/usr/bin/fuse-overlayfs"
mountopt = "nodev,metacopy=on"
[storage.options.thinpool]

To investigate, I've created an SCC which is explicitly allowing the MKNOD capability :

oc get pod test-pod -o json | jq '.spec.containers[].securityContext'
{
  "capabilities": {
    "add": [
      "MKNOD"
    ]
  }
}

In this pod, an mknod command succeeds, but Buildah is still failing to pull the image :

oc rsh test-pod
sh-4.4# mknod -m 0666 /dev/null-test c 1 3
sh-4.4# ls -hal /dev/null-test
crw-rw-rw-. 1 root root 1, 3 Aug 19 11:59 /dev/null-test
sh-4.4# buildah pull registry.hub.docker.com/library/mysql:5.7
Trying to pull registry.hub.docker.com/library/mysql:5.7...
Getting image source signatures
Copying blob 2ff5c3b24fd5 done
Copying blob ef4ccd63cdb4 done
Copying blob 66fb34780033 done
Copying blob 7feea2a503b5 done
Copying blob 71dd5852ecd9 done
Copying blob d6f28a94c51f done
Copying blob 88a546386a61 done
Copying blob 65b18297cf83 done
Copying blob d64f23335fb8 done
Copying blob 6ba4171261fa done
Copying blob 96dcc6c8de93 done
writing blob: adding layer with blob "sha256:66fb3478003364657ac7751c40e41a135e02975f9459f652b1df23e3896b5fac": Error processing tar file(exit status 1): operation not permitted

As stated in the initial comment of this issue, the only way to pull this image is to make the container privileged, with the associated security implications.

A few information about our Buildah build image :

sh-4.4# cat /etc/redhat-release
Red Hat Enterprise Linux release 8.6 (Ootpa)
sh-4.4# rpm -qa | grep buildah
buildah-1.26.2-1.module+el8.6.0+15917+093ca6f8.x86_64
sh-4.4# uname -a
Linux test-pod 4.18.0-305.49.1.el8_4.x86_64 #1 SMP Wed May 11 09:47:48 EDT 2022 x86_64 x86_64 x86_64 GNU/Linux

@rhatdan , @flouthoc : is it a good practice from these images to embed special devices ? Shouldn't the /dev directory be populated on run-time by the host ?

Antoine

rhatdan commented 2 years ago

Yes Images should not embed special devices within them.

I think to make this work, you could add --cap-add mknod But I don't believe this will work on rootless mode.

an-toine commented 2 years ago

I think to make this work, you could add --cap-add mknod But I don't believe this will work on rootless mode.

Indeed, I just tested adding --cap-add mknod to the build command with no luck (even though invoking mknod directly from the build container succeeds) :

sh-4.4# buildah build --cap-add mknod -t test:latest .
STEP 1/2: FROM registry.hub.docker.com/library/mysql:5.7
Trying to pull registry.hub.docker.com/library/mysql:5.7...
Getting image source signatures
Copying blob 2ff5c3b24fd5 done
Copying blob ef4ccd63cdb4 done
Copying blob 7feea2a503b5 done
Copying blob 66fb34780033 done
Copying blob d6f28a94c51f done
Copying blob 71dd5852ecd9 done
Copying blob 88a546386a61 done
Copying blob 65b18297cf83 done
Copying blob d64f23335fb8 done
Copying blob 6ba4171261fa done
Copying blob 96dcc6c8de93 done
error creating build container: writing blob: adding layer with blob "sha256:66fb3478003364657ac7751c40e41a135e02975f9459f652b1df23e3896b5fac": Error processing tar file(exit status 1): operation not permitted

As I understand it, pulling container images embedding special devices on a fuse-overlay-fs backed Buildah is a corner case preventing the build of images inheriting from some other official/well-known images.

In this context, do you think that implementing a command-line switch to optionally skip the extraction of special devices from the image would be possible ? IMHO such a switch would improve Buildah robustness and would be beneficial for end-users having to deal with these images in a more constrained environment.

Antoine

rhatdan commented 2 years ago

Are you running in rootless mode? The cap-add mknod would only work in the case where the container around buildah had dropped it.

an-toine commented 2 years ago

The aforementioned buildah build failed example was running in a Buildah container started with a patched version of anyuid SCC explicitly granting the MKNOD capability.

The build image Containerfile contains a USER root statement so I guess we are running in rootfull mode :

oc rsh test-pod
sh-4.4# id
uid=0(root) gid=0(root) groups=0(root)
rhatdan commented 2 years ago

@giuseppe ideas?

What does cat /proc/self/uid_map say?

giuseppe commented 1 year ago

is the mount program specified?

Otherwise we might try to set a privileged extended attribute (https://github.com/containers/storage/pull/1312 fixes it), that rootless users cannot set

an-toine commented 1 year ago

What does cat /proc/self/uid_map say?

sh-4.4# cat /proc/self/uid_map
         0          0 4294967295

is the mount program specified?

The /usr/bin/fuse-overlayfs mount program is set in configuration and this setting is reflected by buildah info :

sh-4.4# cat /etc/containers/storage.conf | grep mount_program
mount_program = "/usr/bin/fuse-overlayfs"
sh-4.4# buildah info | jq .store.GraphOptions
[
  "overlay.mount_program=/usr/bin/fuse-overlayfs",
  "overlay.mountopt=nodev,metacopy=on"
]
giuseppe commented 1 year ago

any chance you could strace the process to see what syscall is failing?

an-toine commented 1 year ago

I think I actually did that in https://github.com/containers/buildah/issues/3384#issuecomment-1220640961, there is a full strace attached there.

Tell me if something is missing from it, I will create a new trace of the process if needed.

giuseppe commented 1 year ago

no sorry, that is enough.

I think that what is happening is that buildah detects to not have CAP_SYS_ADMIN and creates a user namespace to gain that capability. Doing so, it loses access to CAP_MKNOD in the initial user namespace, so it is not able to create these devices anymore

an-toine commented 1 year ago

Ok that makes sense for user namespace isolation.

However, to avoid granting the SYS_ADMIN capability (which is required to create user namespaces I believe) to the build container, I had set the BUILDAH_ISOLATION environment variable to chroot as documented here :

oc rsh test-pod
sh-4.4# echo $BUILDAH_ISOLATION
chroot

Is the same reasoning applicable for chroot isolation ?