Open chlunde opened 7 years ago
@chlunde @mtrmac @vrothberg Is this two year old issue still valid?
https://github.com/containers/image/pull/536 fixed that.
Not quite; with an empty blob info cache,
skopeo copy docker://localhost:5000/{foo,baz}
won’t use mounts, because the “known location” is recorded into the cache only after reading of a blob starts, or writing it finishes, and the attempt to mount happens before either of these.
i.e. we first decide that we know no existing location that could be used for mounting to baz/
digest, and then open foo/
digest for reading and realize that that there is a location that could possibly be used for mounting.
After one such read/write, the blob info cache for that registry does record at least one location of the blob, and future copies will use mounts.
Handling the special case of intra-registry copies with no cache seems non-trivial (Would we call TryReusingBlob
one last time in copyBlobFromStream
? Only if GetBlob
indicated that it is worth trying again? Only if the source/destination transport is the same? Do we need an extra “this is a second attempt” flag for TryReusingBlob
to keep the layer indexes to be introduced by #611 in sync?).
Still, I think it’s worth keeping a low-priority issue open to perhaps account for this in some future redesign that could make this easier… somehow…
OCI distribution spec has also added a “mount blob without specifying the source repo” API, which also might be relevant here.
Registry v2 has an option for mounting a blob instead of re-uploading a blob:
This feature could be used when copying within the same registry, for example:
With skopeo, this downloads and uploads all the data. With docker, if I have ever pulled img1:tag, running docker tag img2 and docker push on img2 will only copy metadata.
It would also be nice if it could be used to avoid uploading other blobs when copying across registries.
Docker handles this by having a local cache of known digests pushed/pulled from this machine. Another option could be to query the registry to see what layers the user has access to, but for a small push I guess that might be slower than just pushing directly. However, it looks like at least for OCR no users have access to the
/v2/_catalog
service.