Closed nmiculinic closed 1 year ago
We recommend that people running buildah within a locked down container use images from quay.io. https://quay.io/repository/buildah/stable Basically running straight buildah within a locked down container will fail, because the unshare command is blocked. We recommend using the --isolation=chroot, which eliminates the unshare call.
It doesn't seem to help at all:
docker run --rm -it -v $(pwd):/rootfs quay.io/buildah/stable
[root@664c4f767a70 test]# buildah bud --isolation=chroot -f Dockerfile .
Error during unshare(CLONE_NEWUSER): Operation not permitted
ERRO error parsing PID "": strconv.Atoi: parsing "": invalid syntax
ERRO (unable to determine exit status)
Also it appears to be default isolation as well in the container.
Could you try this with podman? Also could you try docker run --security-opt seccomp=/usr/share/containers/seccomp.json --rm -it -v $(pwd):/rootfs quay.io/buildah/stable
I think Docker might be blocking the unshare syscall.
Not sure if this is the case on Ubuntu, but on Debian the kernel itself disables the unsharing: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=808915.
I had to manually allow unprivileged users to unshare, get Docker to use Podman's seccomp profile, and then Buildah ran in the container. Using --isolation=chroot
had no effect, unfortunately.
Don't fully understand what you are saying, Did Buildah work or not work within the container?
Yes it did (on a Debian host), once I ran:
echo 1 > /proc/sys/kernel/unprivileged_userns_clone
I'm not sure why this is necessary if --isolation=chroot
eliminates the unshare call.
Then when using Podman's seccomp profile, Buildah worked in the container:
docker run --security-opt seccomp=/usr/share/containers/seccomp.json --rm -it quay.io/buildah/stable
@nalind @giuseppe Are we still unsharing the namespace if we are doing --isolation=chroot
@nalind @giuseppe Are we still unsharing the namespace if we are doing --isolation=chroot
yes, a new user namespace is still necessary when the user has no CAP_SYS_ADMIN in the container.
@giuseppe Why, what do we need this for? I guess we are still bind mounting the /proc and /sys into the chroot.
@giuseppe Why, what do we need this for? I guess we are still bind mounting the /proc and /sys into the chroot.
yes, we still need to be able to create bind mounts to create the environment used by the chroot
Thanks, I had figured that out.
So docker seccomp.json file blocking unshare is the issue, and should be changed, or as I reccoment use podman/CRI-O for running these containers. You can run docker with Podman /usr/share/containers/seccomp.json file.
Could you try this with podman?
Seeing this error in podman on a ppc64le RHEL 7.6 host with a CentOS7 container.
# whoami
root
# sestatus | grep mode
Current mode: permissive
# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.6 (Maipo)
# arch
ppc64le
# podman --version
podman version 1.4.4
# podman run --rm -it ppc64le/centos:7
# cat /etc/redhat-release
CentOS Linux release 7.8.2003 (AltArch)
# yum install -y buildah
...
# buildah --version
buildah version 1.11.6 (image-spec 1.0.1-dev, runtime-spec 1.0.1-dev)
# buildah from scratch
Error during unshare(CLONE_NEWUSER): Operation not permitted
ERRO error parsing PID "": strconv.Atoi: parsing "": invalid syntax
ERRO (unable to determine exit status)
# buildah --isolation=chroot from scratch
Error during unshare(CLONE_NEWUSER): Operation not permitted
ERRO error parsing PID "": strconv.Atoi: parsing "": invalid syntax
ERRO (unable to determine exit status)
If one starts podman with superpowers, one gets a different error:
# podman run --cap-add ALL --privileged --rm -it ppc64le/centos:7
...
# buildah from scratch
ERRO 'overlay' is not supported over overlayfs
'overlay' is not supported over overlayfs: backing file system is unsupported for this graph driver
# buildah --isolation=chroot from scratch
ERRO 'overlay' is not supported over overlayfs
'overlay' is not supported over overlayfs: backing file system is unsupported for this graph driver
If you are in a container, then you should use buildah from --isolation=chroot, no reason to use container technology within a container.
We do a lot of configuration to make buildah run within a locked down container.
https://github.com/containers/buildah/blob/master/contrib/buildahimage/stable/Dockerfile
no reason to use container technology within a container.
Sorry but when building a image from inside a Jenkins container agent it is useful. Since dockerd is deprecated in Kubernetes we need an alternative; Is it possible with Buildah or do we need to find something else ?
The comment should have been more specific. Basically locking down a process within a container with additional duplicative lock down is not worth it. So if I have dropped caps, and running with SELinux lock down and seccomp rules locked down, then don't attempt to do them again. If the container engines attempt to, they will be blocked, because of the existing container lockdown and your container engine will fail.
It is possible to run buildah and podman within a container. The issue is how much security you lock said container down with.
Running docker within a container has the same issues. It requires a --priivleged container or a container with a leaked docker.socket from the host into the container, which is arguably less secure then just running --privileged.
The goal is not to have a docker socket available in a Container but to build a container image inside a CI agent running in K8S
Sure, but in order to run most containers, you need more then one UID within the container, and a lot of times the process needs some linux capabilities. Podman requires these. (as well as Docker).
If you are in a container, then you should use buildah from --isolation=chroot, no reason to use container technology within a container.
eh? any time users want to manipulate oci/docker image they will use "container technology within a container" as there is no other way to do so
The goal is not to have a docker socket available in a Container but to build a container image inside a CI agent running in K8S
@GJaminon we can run two containers inside a pod (one docker server using dind image and other one is docker client that uses tcp socket from docker server to build containers). In this way, we won't need to mount a docker socket of our host (k8s node) which is no longer available after k8s v1.18 and at the same time, build images inside a containerized jenkins build agent.
That would require a privileged pod though.
Still encountering this issue on quay.io/containers/buildah:v1.28
doing
buildah build --isolation=chroot ${CI_PROJECT_DIR}/Dockerfile
The container is run inside a Gitlab CI Pipeline
Still encountering this issue on
quay.io/containers/buildah:v1.28
doingbuildah build --isolation=chroot ${CI_PROJECT_DIR}/Dockerfile
The container is run inside a Gitlab CI Pipeline
Same for me
I think the default seccomp profile blocks unshare
. You need to use a different seccomp profile
Dockers/containerd blocks unshare and mount. Podman, Buildah, CRI-O do not.
CRI-O by default blocks unshare as well. There is need to change the seccomp profile with CRI-O too
Ok CRI-O Should be using the same seccomp.json file as podman and buildah. rpm -qf /usr/share/containers/seccomp.json containers-common-1-89.fc38.noarch
@mrunalp @haircommander @saschagrunert WDYT?
That was disabled AFAIK because user namespaces open up a lot of new features that can be abused. Many security issues in the kernel in the last years were caused by user namespaces and Docker/containerd were not affected while CRI-O was. Personally I think it makes sense for CRI-O to be more locked up than Podman and allow more kernel features only when strictly necessary
Ok CRI-O Should be using the same seccomp.json file as podman and buildah.
we actually typically embed the seccomp profile by default inside of the binary, but we also do manually remove unshare from it: https://github.com/cri-o/cri-o/blob/main/internal/config/seccomp/seccomp.go#L45 and this was done for the reasons @giuseppe mentions
I can hear Eric B, screaming from the hinderlands. How would a user add back unshare to his own seccomp.go file?
they can either specify a separate profile inside of a pod spec (or unconfined
if they feel so bold) or they can point cri-o to a profile on the node (like the one you attached above)
Many security issues in the kernel in the last years were caused by user namespaces and Docker/containerd were not affected while CRI-O was.
But docker (or any other purely container tech) is inherently insecure anyway. What's the point?
Many security issues in the kernel in the last years were caused by user namespaces and Docker/containerd were not affected while CRI-O was.
But docker (or any other purely container tech) is inherently insecure anyway. What's the point?
what do you mean with that? The point of seccomp for containers is to try to make them safer, as much as possible with the right trade-off between security and what programs would break. If you need a custom profile you can provide that.
User namespaces open up a wider kernel attack surface since more kernel features can be used (e.g. mount APIs). So to play safe it is better to disable it by default, at least on a cluster, and allow it only when it is necessary and in a controlled way.
IMO this should not be changed for CRI-O and unshare
should be left disabled by default.
what do you mean with that?
I mean that docker is insecure. Either you need to fully embrace seccomp (i.e. total lockdown and a user space kernel, see gvisor) Or fully embrace a real VM (see firecracker) All the other half-solutions only create a false sense that something is secure.
Well there are compromises. Allowing unshare would give more possibilities to the malicious agent, e.g. https://unit42.paloaltonetworks.com/cve-2022-0492-cgroups/ could be avoided with unshare blocked.
I am closing the issue since I don't think we should change the default we currently have in CRI-O
@giuseppe am I right that people need to "unblock unshare" anyway to build containers in containers with buildah
? In that so case the decision just makes a false claim of security. Like buildah
is placing blame on users/developers without providing any secure alternative.
I also came here from GitLab, because I saw buildah
as alternative to Docker in Docker. I thought it is just a a simple user space that takes files and packs them. It is very frustrating to spend time in yet another layer of problems with no result.
I also came here from GitLab, because I saw buildah as alternative to Docker in Docker. I thought it is just a a simple user space that takes files and packs them. It is very frustrating to spend time in yet another layer of problems with no result.
@abitrolly I had the same aspirations to use buildah - since we're 100% on podman anyway. I've since switched to kaniko which was a breeze to get going
@awildturtok thanks for the pointer. Going to try kaniko
. I understand that Linux container security is hard, but I would rather see big companies spending time on making Kurzgesagt style videos so that more people could understand how to improve them. With SELinux and podman/buildah I admit most of the time when dealing with their errors I don't know what I am doing, and this is what frustrates me most. High respect to people who understands all that stuff. I am just not one of you.
EDIT: https://gitlab.com/abitrolly/gitlab-elasticsearch-indexer/-/jobs/4250152765#L22 kaniko
rocks. )
It seems a bit weird to need unshared to build a multiarch manifest from already built images. AFAIK, there are no users or privileged operations happening.
Description
I cannot run
buildah bud
Steps to reproduce the issue:
Within the docker container I run the following:
https://github.com/containers/buildah/blob/master/install.md#ubuntu
Describe the results you expected:
I expected everything to work our and build the OCI image.
Output of
rpm -q buildah
orapt list buildah
:Output of
buildah version
:Output of
podman version
if reporting apodman build
issue: not installed*Output of `cat /etc/release`:**
Output of
uname -a
:Output of
cat /etc/containers/storage.conf
:(( default one ))