skopeo sync --digestfile option

james-callahan commented 2 years ago

When using skopeo sync, I'd love a way to easily get the digests of the images in their new location, similar to if skopeo copy had been repeatedly invoked with --digestfile (and collecting them into one file).

mtrmac commented 2 years ago

Thanks for your report.

This sounds reasonable; we’d need to invent a file format.

Looking at the output in https://github.com/containers/skopeo/pull/1459#pullrequestreview-765110575 , a possible alternative might be to add that as a field to the semi-structured log output (probably still opt-in not to pollute the ordinary CLI too much) — and to allow using a truly structured log destination, maybe in the JSON format supported by logrus.

I’m not at all sure tying this with logging is worthwhile; it is an approach to have a file format that clearly scales to any new information; OTOH having a single-purpose --digestfile that just contains repo:tag@digest values one per line would not really be a problem, we can always add one more CLI option and one more file format…

I guess that’s up to the person working on this. Care to prepare a PR?

james-callahan commented 2 years ago

This sounds reasonable; we’d need to invent a file format.

I think newline-delimited would be sufficient.

In particular this feature should work with --dry-run (#1459)

I’m not at all sure tying this with logging is worthwhile

Agreed; I don't really see it as a logging thing: --digestfile of course exists, and not just as an output log line.

mtrmac commented 2 years ago

In particular this feature should work with --dry-run (#1459)

Hum, that’s interesting but more borderline. Note that the digest at the source and the destination can be different (e.g. if the destination registry doesn’t support a format, and a conversion is required, or if the user explicitly requires a format conversion). It would be very surprising for --dry-run and --dry-run=false to both support --digestfile but to output different values.

The plausibly possible output of --digestfile --dry-run, listing just the source data, can probably be scripted with a few lines of shell along the lines of

for tag in skopeo list-tags $sourceRepo | convert-tags-to-a-shell-array; do
    echo "$tag $(skopeo inspect --format '{{.Digest}}' $sourceRepo:$tag)"
done

(maybe not exactly this, and this one is a bit less efficient because it fetches the config as well), so I’m not even quite sure that this operation directly relates to the “sync” concept.

What do you plan to use the digests for?

james-callahan commented 2 years ago

Note that the digest at the source and the destination can be different (e.g. if the destination registry doesn’t support a format, and a conversion is required, or if the user explicitly requires a format conversion). It would be very surprising for --dry-run and --dry-run=false to both support --digestfile but to output different values.

Would that be fixed with --dest-precompute-digests?

The plausibly possible output of --digestfile --dry-run, listing just the source data, can probably be scripted with a few lines of shell along the lines of

When using the yaml source type, it's at leat a little more advanced than this.

What do you plan to use the digests for?

We have a list of public images that we mirror to our private repo. Our engineers put in a request to add a specific image docker.io/someorg/somerepo:sometag@sha256:somedigest. They then need to know what the new image reference (with checksum) will be images.company.com/mirror/docker.io/someorg/somerepo@sha256:somenewchecksum.

Currently we iterate over the list of images requested by engineers and pass them to skopeo copy --digestfile, I'm hoping we can move to the YAML format accepted by skopeo sync.

mtrmac commented 2 years ago

Note that the digest at the source and the destination can be different (e.g. if the destination registry doesn’t support a format, and a conversion is required, or if the user explicitly requires a format conversion). It would be very surprising for --dry-run and --dry-run=false to both support --digestfile but to output different values.

Would that be fixed with --dest-precompute-digests?

That would really involve making a copy and throwing away the result (where “throwing away the result” is a new ~feature), in the worst case, because it needs to read and decompress all data (if the input is schema1), and possibly compress all data (if the input is uncompressed). And the compression algorithms are not, AFAIK, guaranteed to generate the same output on repeated invocations, so it’s just not viable — in the fully general case. It might be possible in some specific situations, but it’s not trivial.

The plausibly possible output of --digestfile --dry-run, listing just the source data, can probably be scripted with a few lines of shell along the lines of

When using the yaml source type, it's at leat a little more advanced than this.

That’s very fair.

What do you plan to use the digests for?

We have a list of public images that we mirror to our private repo. Our engineers put in a request to add a specific image docker.io/someorg/somerepo:sometag@sha256:somedigest. They then need to know what the new image reference (with checksum) will be images.company.com/mirror/docker.io/someorg/somerepo@sha256:somenewchecksum.

Hum, so this does actually run into the hard case of image conversion (and if so, from what format to what format)? Or is that some other format change, like compression? In particular, would #1378 be interesting?

james-callahan commented 2 years ago

Hum, so this does actually run into the hard case of image conversion (and if so, from what format to what format)?

I don't think so? Or at least, I've never thought about it. Originally we did docker pull && docker tag && docker push, then moved to buildah pull && buildah tag && buildah push, and now skopeo copy.

github-actions[bot] commented 2 years ago

A friendly reminder that this issue had no activity for 30 days.

github-actions[bot] commented 9 months ago

A friendly reminder that this issue had no activity for 30 days.

containers / skopeo

skopeo sync --digestfile option #1497