containers / image

Work with containers' images
Apache License 2.0
866 stars 378 forks source link

Intra-registry copy of previously-unseen images does not use mount support #312

Open chlunde opened 7 years ago

chlunde commented 7 years ago

Registry v2 has an option for mounting a blob instead of re-uploading a blob:

POST /v2/<name>/blobs/uploads/?mount=<digest>&from=<repository name>
Content-Length: 0

This feature could be used when copying within the same registry, for example:

skopeo copy docker://rexample.com/prj/img1:tag docker://example.com/prj/img2:tag

With skopeo, this downloads and uploads all the data. With docker, if I have ever pulled img1:tag, running docker tag img2 and docker push on img2 will only copy metadata.

It would also be nice if it could be used to avoid uploading other blobs when copying across registries.

Docker handles this by having a local cache of known digests pushed/pulled from this machine. Another option could be to query the registry to see what layers the user has access to, but for a small push I guess that might be slower than just pushing directly. However, it looks like at least for OCR no users have access to the /v2/_catalog service.

rhatdan commented 5 years ago

@chlunde @mtrmac @vrothberg Is this two year old issue still valid?

vrothberg commented 5 years ago

https://github.com/containers/image/pull/536 fixed that.

mtrmac commented 5 years ago

Not quite; with an empty blob info cache,

skopeo copy docker://localhost:5000/{foo,baz}

won’t use mounts, because the “known location” is recorded into the cache only after reading of a blob starts, or writing it finishes, and the attempt to mount happens before either of these.

i.e. we first decide that we know no existing location that could be used for mounting to baz/digest, and then open foo/digest for reading and realize that that there is a location that could possibly be used for mounting.

After one such read/write, the blob info cache for that registry does record at least one location of the blob, and future copies will use mounts.

Handling the special case of intra-registry copies with no cache seems non-trivial (Would we call TryReusingBlob one last time in copyBlobFromStream? Only if GetBlob indicated that it is worth trying again? Only if the source/destination transport is the same? Do we need an extra “this is a second attempt” flag for TryReusingBlob to keep the layer indexes to be introduced by #611 in sync?).

Still, I think it’s worth keeping a low-priority issue open to perhaps account for this in some future redesign that could make this easier… somehow…

mtrmac commented 11 months ago

OCI distribution spec has also added a “mount blob without specifying the source repo” API, which also might be relevant here.