containers / storage

Container Storage Library
Apache License 2.0
539 stars 234 forks source link

zstd:chunked blocker: TarSplitChecksumKey not used in a layer ID #1888

Closed mtrmac closed 1 month ago

mtrmac commented 2 months ago

@giuseppe Zstd:chunked has uses a TarSplitChecksumKey annotation, separate from the TOC digest.

That’s fine as far as per-image individual trust goes, because the annotation is authenticated by the manifest digest.

But we deduplicate individual layers by the TOC digest alone; so it is possible for two images to use a layer with the same TOC digest, but different tar-split locations and contents (e.g. one with an added set-UID bit somewhere).

Locally, it seems by far easiest to move the tar-split digest into the TOC. That way the by-now-widespread assumption that the TOC digest is a sufficient identifier can be maintained.

For other purposes, like the BlobInfoCache and metadata-only reuse, the value returned from toc.GetTOCDigest is used both for reuse/match lookups, so whatever this function returns must be a complete identifier; it should be unambiguous, and we can never add more annotations that change the semantics of a layer with a matching GetTOCDigest value. Even if we replaced GetTOCDigest now with something that accounts for tar-split, we need this guarantee to be forward-compatible (consider pulling an image just before a new annotation type is added, upgrading, and then pulling another just after). I think that also argues for a long-term design of keeping everything anchored to a single digest value.


A bit more generally, I’ve been wondering whether the TOC digest is a good identifier long-term.

The way things are now, we basically can’t significantly change the format of the TOC, otherwise we risk the possibility of the same TOC blob being valid for both the old and new formats, allowing an attacker to trigger deduplication between layers with the same TOC digest but different TOC formats and differently-understood contents.

For example, we just can’t support a new partial-pull format with some other JSON-based TOC format if it could be constructed ambiguously with the zstd:chunked TOC format. I’m not 100% sure this is worth worrying about — but if it were ever to happen, we would need a fairly large restructuring, or at least renaming/re-documenting. So I’m raising it right now, in case such things were expected. (E.g. will this be required for composeFS?)

mtrmac commented 2 months ago

Maybe one way to think about this is that I think it would be clean if #1887 were extended further, so that

Yes, I realize it’s very late to change all this. But if we knew that we would need to do it in the near future, I think there’s a strong argument to do it ASAP.

giuseppe commented 2 months ago

The way things are now, we basically can’t significantly change the format of the TOC, otherwise we risk the possibility of the same TOC blob being valid for both the old and new formats, allowing an attacker to trigger deduplication between layers with the same TOC digest but different TOC formats and differently-understood contents.

how can this happen? If we modify the TOC in any way, isn't safe to just assume the digests are different and they are treated as two different layers? A layer could have two variants with different TOCs, even today just by sorting the entries in a different way but they are treated as two different entities. Isn't this enough?

mtrmac commented 2 months ago

Consider things like

{
   "version": 1,
   "entries": [/*v1 entries */]
   "someOtherVendorVersion": "2.0",
    "someOtherVendorEntries": [/*…*/]
}

A v1 consumer will see a valid v1, a sameOtherVendor consumer will see a valid v2. But they are the same blob = same TOC digest = right now, same c/storage layer.

And which consumer is used probably depends on MIME type and annotations in the manifest.

Or, alternatively, consider what if “other” version uses a non-JSON format where it happens to be possible to make a single file valid in both formats.

(To be fair, the existing "version" field is a fairly strong countermeasure if everything uses JSON. Maybe we can make that a requirement.)

mtrmac commented 2 months ago
  • (some future variant of) toc.GetTOCDigest returned a “complete” format+identity identifier (of a kind which is long-term stable across new added feature and new format variants)

An unambiguous hash of (layer MIME type, all annotations) might work. Assuming manifests have per-layer MIME types and annotations :)

(We could also add the layer digest.) Of course the more values we add into the identifier, the fewer opportunities for reuse.

giuseppe commented 2 months ago

Would it be enough to just prepend the format used to the hashed data? e.g. the TOC Digest can be sha256sum("zstd:chunked:$version" + $tocData), that should be enough to protect from the same TOC being used by different formats. What do you think?

In practice the metadata file used by estargz and zstd:chunked is almost the same, so stuff like chunk offset is represented in the same way. There is not a way that some metadata is used by one of the two formats and ignored by the other, so in principle these two cannot conflict.

mtrmac commented 2 months ago

To be a bit more explicit, the primary thing that I need solving here is that TarSplitChecksumKey is not participating in the TOC digest. That can be done by hashing the annotations, or by moving TarSplitChecksumKey from the annotations into the TOC.


I agree the “ambiguous TOC format” concern is, short term, not urgent, because there is no ambiguity between estargz and zstd:chunked.

Quite reasonably we can just let things stay as they are, and decide to deal with that if/when a possibly-ambiguous TOC format is ever added.

OTOH if we can solve both the TarSplitChecksumKey and format ambiguity cheaply in one place, I wouldn’t mind doing that now.

I’m bringing this up primarily to benefit from your expertise about ComposeFS etc. — is a new TOC format something you expect?


WRT the specifics, sha256sum("zstd:chunked:$version" + $tocData), it would be more efficient if the identifier could be computed just from the manifest data, without having to read the TOC itself (extra I/O and extra roundtrips, + the TOC can be somewhat large). So sha256sum(unambiguousTuple("zstd:chunked", $version, $tocDigest)) would be better than raw $tocData. All that is assuming there are no other security-relevant annotations.

mtrmac commented 1 month ago

tarSplitChecksumKey was moved into the TOC by #1902.

cgwalters commented 1 month ago

A bit more generally, I’ve been wondering whether the TOC digest is a good identifier long-term.

One thing related here is https://github.com/containers/composefs/issues/294

cgwalters commented 1 month ago

It'd be really good to write something like a zstd-chunked.md living design document as we make these changes. I'd be happy to help collaborate on one and improve it.

This is only tangentially related but basically the original idea with Docker was just wiring together two simple things: tar and json. And of these two of course, as is well known tar was a terrible for the use case we have here; the fact that there's like 3 major variants formats, each of which on their own have multiple possible representations for data...the lack of indexing of of course. And the "512 byte record filled with zeros" is itself extremely ugly, with the need for magical separate records for long filenames. Anyways, we all know this stuff, but just saying here.

Now zstd:chunked addresses indexing, but the TOC actually magnifies the first problem of representational ambiguity around what to do if the TOC disagrees with the tar stream - it's now easily possible to make a container image which appears one way when unpacked with a zstd:chunked aware engine, and another way with docker (which just knows zstd, but not zstd:chunked) right?

One thing we probably could consider here with zstd:chunked is forcing on a "normal form" for the tar stream - specifically e.g. we could mandate ustar, in a specific way. And the important thing (related to this issue around the tar-split checksum) is once we do that there isn't a need to checksum the tar-split at all because we know we can regenerate it from the data we already have. Just a thought. Of course, this type of "standardization" is less obvious to do until there's multiple implementations (other than of course our own code interacting with prior versions of itself).

mtrmac commented 1 month ago

That might be two design documents: What is the on-registry artifact format; and what is the on-disk storage format/mechanism when pulling.


it's now easily possible to make a container image which appears one way when unpacked with a zstd:chunked aware engine, and another way with docker

Yes; or, depending on whether the registry decides to sabotage chunked pulls — which is why this issue is a security-relevant blocker.


One thing we probably could consider here with zstd:chunked is forcing on a "normal form" for the tar stream

AFAICS that does not help with the ambiguity issue at all: the ambiguity, essentially, arises because we don’t want to read the non-data parts of the tar stream on a chunked pull; ideally 99% of the files are already on-disk and we just don’t fetch that data from the registry, nor do any other computation if we can help it. As long as we don’t do that, the digest of the full tarball is unverified simply because we don’t want to. (We could, hypothetically, do an efficient chunked pull, and then combine the on-disk data with the tar-split to create a full tar stream, and compare the uncompressed digest of that tar stream with a trusted value. If we did that, we wouldn’t need the TOC-based identifier as in this issue, but it would also cost too much.)

I think the “traditional” view of container image compression is that it doesn’t change the DiffID, i.e. that we should preserve the original tar-split across conversions between gzip and zstd:chunked, both ways. Now, with the way c/storage handles chunked layers, “the same DiffIDs” no longer implies “the same local image ID”, so I’m not quite sure it is still valuable, but at least the config digest on-registry does not change across compression changes, and some users might like that.


One aspect where it does, I think, matter a lot, is that there is one more representational ambiguity: the data in the TOC vs. data in the tar-split. AFAICS it is possible to pull and inspect an image, see that there are no set-UID executables, and then re-push; with the effect that the push builds a stream using the original tar-split, and that tar-split does contain set-UID executables. So, a non-chunked pull (either because the consumer does not understand :chunked, or because the image was compressed using gzip) will not create a set-UID executable.

@giuseppe: Is there anything to prevent ^^^? If not, what do we do? (Ignore tar-split, consume TOC; or, alternatively, ignore TOC metadata, consume tar-split; or somehow validate the two are equivalent???)


(A general note: I like to link to https://devblogs.microsoft.com/oldnewthing/20200213-00/?p=103436 , because it is so different from what we do: instead of imposing an abstraction of an “image” as a stand-alone unit, it is an end-to-end design of “a typical operation is an update, what is an efficient implementation of that?”.)

cgwalters commented 1 month ago

AFAICS [normal form tar] does not help with the ambiguity issue at all: the ambiguity, essentially, arises because we don’t want to read the non-data parts of the tar stream on a chunked pull;

True. However, a security scanning tool operating on a registry could detect and flag such images at least. It could probably even be integrated into a registry on push (some performance cost obviously.

One aspect where it does, I think, matter a lot, is that there is one more representational ambiguity: the data in the TOC vs. data in the tar-split.

It's currently blowing my mind that we embed the tar-split in the tarball, I didn't realize that until now. Umm...ahhh, I see 7bbf6ed44807eecfc1f9d3da9ff7da8cdb8e48b8 - so it postdates the first time I looked somewhat closely at zstd:chunked.

So there's actually three distinct copies of file metadata in a layer :exploding_head:

(A general note: I like to link to https://devblogs.microsoft.com/oldnewthing/20200213-00/?p=103436 , because it is so different from what we do: instead of imposing an abstraction of an “image” as a stand-alone unit, it is an end-to-end design of “a typical operation is an update, what is an efficient implementation of that?”.)

Right, they can do stuff like that because they control both ends of the system with one canonical implementation on each end. Not our world :smile: (That said, I do think it makes sense still to look at standardizing such things in the OCI; there is https://github.com/flatpak/flatpak-oci-specs/blob/main/image-deltas.md which is related, and I wonder if it could gain some ideas from the microsoft patching approach)