Open cgwalters opened 4 months ago
what is also important is that the reference to <someimage>
works unmodified with the runtime, e.g. if used in systemd file, scripts using podman, microshift etc. No matter what the reference is (labels, SHA digest, etc.).
Esp. digests used to be a problem in the past because they could change when moving/embedding the oci container image. MicroShift/OpenShift release images rely on digest references.
Why not use additionalstores for this. Latest containers-common setup Podman and buildah to automatically look for an additional store in /usr/lib/containers/storage. If images are pulled into this store, then Podman will use this as a read/only store and /var/lib/containers/storage as a read/write store.
I think using vfs backend is a bad idea btw, at least if you run non-readonly containers, because the vfs driver cannot use overlayfs for the container upper layer. The ideal approach would be to use the overlayfs backend with composefs enabled, because then there will be no whiteout files in the container storage (they are all inside the composefs blob in the storage).
Adding in CAP_SYS_ADMIN seems to allow this to work?
$ podman build --cap-add SYS_ADMIN /tmp
STEP 1/2: FROM quay.io/centos-bootc/centos-bootc:stream9
STEP 2/2: RUN podman --root=/usr/share/containers/storage pull alpine
Resolved "alpine" as an alias (/etc/containers/registries.conf.d/000-shortnames.conf)
Trying to pull docker.io/library/alpine:latest...
Getting image source signatures
Copying blob sha256:4abcf20661432fb2d719aaf90656f55c287f8ca915dc1c92ec14ff61e67fbaf8
Copying config sha256:05455a08881ea9cf0e752bc48e61bbd71a34c029bb13df01e40e3e70e0d007bd
Writing manifest to image destination
05455a08881ea9cf0e752bc48e61bbd71a34c029bb13df01e40e3e70e0d007bd
COMMIT
--> c8edcbce04cd
c8edcbce04cda8c52eb2043f9bcd23c74cb6a1e90948bb08dde27f2bfd31b7bd
Here is a little test I did to make this work.
$ cat /tmp/Containerfile FROM quay.io/centos-bootc/centos-bootc:stream9 RUN sed -e '/additionalimage.*/a "/usr/lib/containers/storage",' -i /etc/containers/storage.conf RUN podman --root=/usr/lib/containers/storage pull alpine
$ podman build -t bootc --cap-add SYS_ADMIN /tmp STEP 1/3: FROM quay.io/centos-bootc/centos-bootc:stream9 STEP 2/3: RUN podman --root=/usr/lib/containers/storage pull alpine Resolved "alpine" as an alias (/etc/containers/registries.conf.d/000-shortnames.conf) Trying to pull docker.io/library/alpine:latest... Getting image source signatures Copying blob sha256:4abcf20661432fb2d719aaf90656f55c287f8ca915dc1c92ec14ff61e67fbaf8 Copying config sha256:05455a08881ea9cf0e752bc48e61bbd71a34c029bb13df01e40e3e70e0d007bd Writing manifest to image destination 05455a08881ea9cf0e752bc48e61bbd71a34c029bb13df01e40e3e70e0d007bd --> b4e3d3d3506b STEP 3/3: RUN sed -e '/additionalimage.*/a "/usr/lib/containers/storage",' -i /etc/containers/storage.conf COMMIT bootc --> a94b77143258 Successfully tagged localhost/bootc:latest
$ podman run -ti --cap-add SYS_ADMIN bootc podman images REPOSITORY TAG IMAGE ID CREATED SIZE R/O docker.io/library/alpine latest 05455a08881e 8 weeks ago 7.67 MB true
In order to use Overlay within a container you need to run the container with CAP_SYS_ADMIN or play with rootless containers.
In order to use Overlay within a container you need to run the container with CAP_SYS_ADMIN or play with rootless containers.
We're having a realtime conversation about this and I think there's general agreement that if the problem is that podman pull
is trying to do an overlayfs mount, then the bugfix would be to podman to have it stop doing that.
I still have an open uncertainty about whiteouts which I agree with Alex would be much better fixed by composefs - avoiding the need for metadata in general written directly into the container image filesystem.
cross-building from arm M2 for x86_64 (after adding --cap-add SYS_ADMIN
) there's an issue:
$ cat Containerfile
FROM quay.io/centos-bootc/centos-bootc:stream9
RUN podman pull alpine && podman pull busybox
This builds fine from arm M2 machine:
podman build --arch aarch64 -t myimage:arm --cap-add SYS_ADMIN .
This fails from my arm M2 machine:
podman build --arch x86_64 -t myimage:amd64 --cap-add SYS_ADMIN .
and here's the weird error:
STEP 2/2: RUN podman pull alpine && podman pull busybox
Resolved "alpine" as an alias (/etc/containers/registries.conf.d/000-shortnames.conf)
Trying to pull docker.io/library/alpine:latest...
Getting image source signatures
Copying blob sha256:4abcf20661432fb2d719aaf90656f55c287f8ca915dc1c92ec14ff61e67fbaf8
Error: copying system image from manifest list: writing blob: adding layer with blob "sha256:4abcf20661432fb2d719aaf90656f55c287f8ca915dc1c92ec14ff61e67fbaf8": processing tar file(Error: unrecognized command `podman /`
Did you mean this?
cp
ps
rm
Try 'podman --help' for more information
): exit status 125
Error: building at STEP "RUN podman pull alpine && podman pull busybox": while running runtime: exit status 125
Thx for progressing on this!
I would feel better with some automated CI test cases that mimic the actual use case as a smoke test: a container image with whiteouts (!!!) referenced using sha digest in the containerfile. Then bootc the resulting image and ensure that the image referenced with the same digest as in the containerfile comes up and works correctly. Because: we had the same situation with Blueprints and image builder - it initially looked like it would be working, but actually was not. And this is a must have feature for microshift / edge deployments in airgapped / disconnected used cases.
And to add an additional requirement: building of these images has to work on OpenShift in a CI/CD pipeline without cluster-admin privilege's .
The issue seems to be that podman without CAP_SYS_ADMIN fails over to setting up a User Namespace with a single mapping. I am talking to @giuseppe about whether or not this is required or how we could work around this. For now this will work fine with CAP_SYS_ADMIN added to the build. I don't see any issues with the Whiteouts being stored in the images, as they normally do on a host. The running of containers on containers is blocking overlay on overlay, but I don't think this is an issue we would see here.
When we configure the user namespace we don't know what command is going to be executed by Podman so we don't check for that combination (and possibly we need also CAP_SETFCAP
), but we check only for CAP_SYS_ADMIN
.
I think it is correct this way because even if you pull the images in that environment, you won't be able to use them until you gain CAP_SYS_ADMIN
, and setting the user namespace will probably use different mappings.
Also relevant is https://github.com/ostreedev/ostree/pull/2722
This relates to https://github.com/containers/bootc/issues/128 - but isn't quite the same thing. Let's use this as a tracker for supporting "nesting" container images.
We should ideally support something like this:
Where
somecontainer.container
is a podman systemd unit that also uses:The reason I mentioned
--storage-driver=vfs
is to avoid overlayfs and nested whiteouts...I think as of recent overlayfs this is supported at runtime, but...I can't make a whiteout in a defaultpodman run
invocation; I think the device cgroup may be coming into play?Even if we could make the whiteout, I think we'd run into problems because there's no standard for nesting them at the OCI level. Also xref https://www.spinics.net/lists/linux-unionfs/msg11253.html