containers / skopeo

Work with remote images registries - retrieving information, images, signing content
Apache License 2.0
8.18k stars 775 forks source link

Add an option for "skopeo sync" not to "pull" the destination image #1516

Open ChristianCiach opened 2 years ago

ChristianCiach commented 2 years ago

When syncing images between repositories, the destination repository (Harbor for us) registers a "pull" operation for every image that also exists in the source repository.

We've configured our Harbor instance (the destination repository) to prune all images that have not been pulled within the last 30 days. Unfortunately, because we are syncing the images daily using Skopeo, pruning will never happen.

I believe this happens because of https://github.com/containers/image/pull/1041, which causes the manifest of the destination image to be downloaded on every sync (see destImageSource.GetManifest(ctx, targetInstance) in image/copy/copy.go). Unfortunately, the variable OptimizeDestinationImageAlreadyExists is hardcoded to true in cmd/skopeo/sync.go, so I cannot easily verify or disable this behavior.

Using Skopeo 1.5.1 and skopeo sync --src yaml.

mtrmac commented 2 years ago

Thanks for your report.

Yes, that’s the likely cause. The speed-up from that (as detailed on https://github.com/containers/skopeo/issues/1021 ) is pretty significant, so we probably want to keep that behavior by default, but making that possible to disable with an option seems reasonable. Care to prepare a PR?

ChristianCiach commented 2 years ago

@mtrmac Thanks for your fast response.

I am fluent in several programming languages, but Go is unfortunately not one of them.

That being said, I am not even sure what such an option should be named. --disable-sync-optimizations is not a very assuring name :)

Also, should this be a CLI option only, or should it also be configurable per-registry when using yaml files?

I think we should have a better understanding about how we want to handle this issue before I try to prepare a PR.

ChristianCiach commented 2 years ago

I feel this issue is roughly related to https://github.com/containers/skopeo/issues/1498.

We, too, have immutable image tags, except for tags that contain the word latest. So, in theory, I could write a wrapper around skopeo sync that compares tags and skips these that are already present in the destination registry -- except those that contain the word latest, because these are not immutable and should always be synced.

By comparing tags, we could probably speed up the process and Harbor would (most likely) not register a "pull" for the images whose tags are already present.

mtrmac commented 2 years ago

That being said, I am not even sure what such an option should be named. --disable-sync-optimizations is not a very assuring name :)

Also, should this be a CLI option only, or should it also be configurable per-registry when using yaml files?

At this point I think it’s reasonable to leave both option naming and whether to add this to YAML to the person who decides to implement it — a CLI-only contribution would be just fine, we can add a YAML option later (or eventually much later if we end up significantly redesigning the YAML format, see the other comment). I’d probably prefer a more specific option (--dont-compare-with-destination) to avoid ambiguity about any future “sync optimizations”; either way it’s probably not going to be an elegant one-word name.

mtrmac commented 2 years ago

I feel this issue is roughly related to #1498.

We, too, have immutable image tags, except for tags that contain the word latest.

FWIW that complicates #1498 notably. Regexps are powerful enough that this could probably be done in YAML, but the current YAML format can’t be very practically extended to add per-repo options. We will probably, eventually, need a new YAML format either way — it’s just that “except for tags that contain the word…” will need that new format to happen.

ChristianCiach commented 2 years ago

Instead of "except for tags that contain the word" it would probably be better to specify a regex for tags that should be regarded as immutable. I guess most people use a fixed version scheme for their immutable tags (as do we), so I think it would be more direct to specify a regex for tags that should not be synced if already present on both sides.

This is an interesting topic. Let me think about this a bit more and do some experiments with it. But please don't expect a PR in the very near future :)

ChristianCiach commented 2 years ago

images-by-immutable-tag-regex as an alternative to the existing images-by-tag-regex in YAML? Just thinking out loud.

mtrmac commented 2 years ago

Maybe…


It would be easier to track if the the “allow assuming that the tags are immutable” and the “never read from the destination registry” feature discussions were separate and consolidated in their own GitHub issues. Let’s discuss possible designs for #1498 in #1498 , and exposing OptimizeDestinationImageAlreadyExists here, please.

The two are, of course, somewhat related, but exact option naming and placement for #1498 should be recorded there.

github-actions[bot] commented 2 years ago

A friendly reminder that this issue had no activity for 30 days.

github-actions[bot] commented 1 year ago

A friendly reminder that this issue had no activity for 30 days.

github-actions[bot] commented 10 months ago

A friendly reminder that this issue had no activity for 30 days.