Document eStargz with `zstd` compression instead of only `gzip`

containerd / stargz-snapshotter

Fast container image distribution plugin with lazy pulling

https://github.com/containerd/containerd/issues/3731

Apache License 2.0

1.14k stars 114 forks source link

Document eStargz with `zstd` compression instead of only `gzip` #1596

Open aochagavia opened 7 months ago

aochagavia commented 7 months ago

Currently, the docs on the structure of eStargz mention that layers are compressed using gzip. However, as far as I understand, eStargz also supports zstd as a compression mechanism (it is mentioned in the nerdctl docs, though they call it zstdchunked).

It would be great to update the docs to reflect zstd support. Specifically, the following questions need to be answered when using zstd as the compression format:

Are gzip blobs replaced by zstd frames (as defined in the zstd specification)? Are chunks also zstd frames?
How does the footer look like? The docs define it in terms that are closely tied to gzip and it's not clear to me how it translates to zstd.

For context, I'm working on an image bakery application in Rust and I want to support eStargz with zstd compression.

ktock commented 7 months ago

It would be great to update the docs to reflect zstd support.

SGTM

Are gzip blobs replaced by zstd frames (as defined in the zstd specification)? Are chunks also zstd frames?

Yes

How does the footer look like? The docs define it in terms that are closely tied to gzip and it's not clear to me how it translates to zstd.

- zstd skippable frame header (64bits)
- TOC offset (64bits)
- zstd compressed TOC length (64bits)
- Uncompressed TOC length (64bits)
- manifest type (=1) (64bits)
- zstd:chunked magic number (64bits)

aochagavia commented 7 months ago

Perfect, thanks! Once I have a clear idea of how this all works I might open a PR to update the docs 👍

aochagavia commented 6 months ago

I'm not yet confident enough in my knowledge to update the docs, but here's some relevant information for whoever is interested in creating an independent implementation of eStargz + zstd:

When using zstd, the TOC is not included in the layer's tar archive. Instead, it's included as a raw string inside a skippable frame. This diverges from what gzip does (according to the diagram in the docs).
As a consequence of the previous point, the tocOffset doesn't point to a tar header (there is none). Instead, it points directly to the start of the TOC's JSON.

aochagavia commented 6 months ago

~Here's another bit of information (not sure whether it's zstd-specific or also applies to gzip): in the TOC, the offset field for .no.prefetch.landmark doesn't link to the beginning of the header in the tar archive, but links directly to its body instead (header and body are each compressed in their own zstd frame to allow this). This diverges from the way other files are handled (normally the header and the body are inside the same zstd frame and the offset points to the beginning of the frame).~

Update: my comment above is incorrect. For all files (including the landmark), the offset indeed links directly to the start of the compressed body, which is obvious because any non-empty body gets its own zstd frame. Sorry for the confusion.