Open cgwalters opened 5 months ago
Something like a standard containers.composefs-digest (bikeshed: label or annotation?). And we should define exactly how a container image is mapped to a composefs tree.
I'm still thinking a lot about this. Here's a related PR: https://github.com/containers/composefs/pull/320
To elaborate on this, again what I really want is the signature on an image (for a manifest) to efficiently cover a composefs blob which is "the image".
To recap the proposal from the above PR it's basically that we take the 3 parts from an image and put them in a single composefs blob (with a single fsverity digest):
user.composefs.sha256
with the full/descriptor sha256 digestThen, when building an image (I know this gets a little circular) we support computing the fsverity digest of that whole thing, and inject that digest as an annotation into a copy of the manifest.json
with solely that difference which becomes the canonical manifest.json
. The version in the cfs image can be transformed back into the canonical version by re-injecting that annotation (this would work well if the manifest is required to be in canonical form (though cc https://github.com/awslabs/tough/issues/810)).
This style of image with the containers.composefs-digest
annotation I would call a "composefs verified OCI image" - it allows us again to have a cosign/GPG style signature cover that manifest, which covers the composefs digest, which covers everything else.
I think it's a really desirable property from such a layout that a fetched OCI image is "a single file" (or really, "a single composefs" - we expect a shared backing store of course) and e.g. a "fsck" style operation is just "composefs fsck" which can be efficiently delegated to the kernel with fsverity (lazily or eagerly).
This is ignoring the "tar split" problem of course, for supporting re-pushing images, it'd be nice to have that covered by composefs/fsverity...but...super messy without getting into "canonical tar" territory. At least for verified images.
To be clear of course, for unverified images (i.e. images that we just want to store locally as composefs, that don't have the precomputed annotation) we can stick anything else we want inside that local composefs, including the tar-split data.
We want to natively support e.g. https://github.com/sigstore/cosign to sign images that can be verified client side. cosign covers the manifest, which has the composefs fsverity digest of the "artifact composefs" with the manifest and config and all layers. TBD: Standard for location of signatures for composefs-oci.
Question for Miloslav: Does c/storage cache the signature on disk today?
I've been prototyping things out more in https://github.com/cgwalters/composefs-oci in the background, and one thing that I think is interesting is I needed to efficiently index back from a composefs to the original manifest descriptor sha256, so I added a user.composefs.sha256
extended attribute on the manifest JSON (stored in the composefs) for the use case of client synthesized composefs blobs.
For server signed composefs, we obviously can't do that because it becomes fully circular with the composefs digest covering the manifest. Maybe instead what we can do is always store the original manifest digest as an xattr on the composefs itself. That would mean it becomes "unverified state", but that's probably fine.
Still also thinking things through more...given that we know we need to also maintain individual layers, I think we should add an annotation on each layer with its composefs digest as well; this seems like a no brainer in general, but it would specifically help align with how c/storage is representing things today.
Tangential but interesting here...we could also look at "composefs as a DDI" where the EROFS is in the DDI but the backing store isn't, but would allow covering the erofs with a dm-verity signature in a standardized envelope.
But, we still have the need to represent layers and handle OCI metadata.
I fleshed out some proposed standards a bit more in https://github.com/cgwalters/composefs-oci-experimental, but needs implementation work in merging with some of the logic in https://github.com/allisonkarlitskaya/composefs_experiments
So the way this works today in https://github.com/allisonkarlitskaya/composefs_experiments via cfsctl oci seal
is that:
mkcomposefs
containers.composefs.fsverity
label. I used this name because it's what was present in Colin's experimental repository. I think it's a reasonable name.The created composefs image is the "straight-up" content of what we found in the container layers (after applying whiteouts). There's no extra metadata there. I'd also resist adding out-of-band metadata in the form of xattrs on the image file: one might easily imagine two container images with the same filesystem content ending up mapped to the same composefs image, and then which container would the xattr point back to?
On pull, if the container has a containers.composefs.fsverity
label, we have two options (and I didn't decide which one I like better):
on pull, if we have the containers.composefs.fsverity
label, we can immediately try to create a composefs image for the container and verify that it matches what we found in the label. This has the benefit that the container would be immediately ready to be mounted. I often think about read-write and read-only operations on the repository and (for example) booting the system should be a read-only operation.
a separate 'prepare' step that creates the composefs and verifies the label. Incidentally, this operation looks an awful lot like the seal
operation plus a verification that the result is equivalent to the original container. I sort of lean this way at present, but it's also kinda an implementation detail.
I think that sounds good as a first cut! We could write that up in a bit more "reference style" as something like docs/standards/oci.md
or so?
Sure. I'll try to get a PR out today.
One thing that's being debated is the intersection of zstd:chunked and the config digest. Today there's logic in c/image and c/storage that special cases zstd:chunked to hash in the TOC into the config and in some cases depending on how images were pulled that hash can be different. xref
at least though I need to find the place where the actual hashing is done, it's a better reference.
Anyways, this is a wildly complex topic because basically zstd:chunked and composefs are both:
And what's extra fun is in theory they're independent; we need to consider cases where neither are used, just one is used, or both are used, creating a 2x2 matrix.
But back to the intersection; in the case where both are in use - and by this I specifically mean for an externally generated image (ignoring the case of zstd:chunked or composefs computation client local - "trust"/security have different implications there) - the core premise of the composefs design we have now is that given a config, we can reliably prove that the final computed merged rootfs matches what was expected, which covers a lot of scenarios.
In discussions with Allison I think we'd also agreed to include in the design an annotation on each layer in the manifest for the composefs digest for that tar stream - this greatly helps incremental verification and caching, and keeps composefs as the "source of truth" for metadata (as opposed to tar-split or on-disk state for example). In this model then we need to consider both manifest and config.
In a nutshell I guess I'd reiterate my personal feeling that composefs is more important than zstd:chunked and I'd actually like to consider making zstd:chunked require the composefs annotations and design at least in the generated manifest/config, as opposed to thinking of composefs as a derivative of zstd:chunked.
We should standardize some of the interactions with composefs and OCI. Today the composefs tooling is very generic, and integration with OCI or other ecosystems is left to do externally (as is happening in e.g. containers/storage).
Embedding
containers.composefs-digest
as metadataWhile this is a broad topic the first example I'd give here is that we should standardize embedding the composefs digest in a container image manifest; much as was done with ostree and embedding it in in the commit metadata.
Something like a standard
containers.composefs-digest
(bikeshed: label or annotation?). And we should define exactly how a container image is mapped to a composefs tree. Specifically, I would argue here that the embedded digest should be of the merged, flattened filesystem tree - and that's actually how it should be mounted as well (instead of doing it via individual overlayfs mounts) - i.e. we'd do it how ostree does it.However, it wouldn't hurt to also embed an annotation with the composefs digest for each individual layer (as part of the descriptor metadata) to give a runtime the ability to selectively choose to manage individual layers or not.
Finally of course, it would make sense for us to provide some tooling which does this. It's an interesting question, should there be something like
podman build --feature=composefs
to auto-inject this? But in the general case we can just provide a simple tool that accepts an arbitrary container image and "re-processes" it to add this metadata.