Linear scan of a repository with lots (200+) of tags causes pull IO timeout

bshi commented 9 years ago

This was originally reported at https://github.com/GoogleCloudPlatform/docker-registry/issues/22

It seems when performing "docker pull foo/bar:sometag" the repository performs a linear scan of ALL tags in "foo/bar". When backed by object storage systems like GCS, this can take a long time. It has broken image distribution for us.

bshi commented 9 years ago

As part of this investigation, I discovered that "docker pull foo/bar:sometag" will incur a hit to the API endpoint that lists all tags for 'foo/bar'. This seems a bit wasteful. @dmp42 - as I'm not too familiar with the details of 'docker pull', perhaps you know offhand whether this is indeed unnecessary work and whether it's worth filing a bug in docker?

dmp42 commented 9 years ago

@bshi I don't think reporting this is worth the effort - energy IMO is better focused on registry v2 development.

wking commented 9 years ago

On Tue, Oct 21, 2014 at 07:06:37PM -0700, Bo Shi wrote:

As part of this investigation, I discovered that "docker pull foo/bar:sometag" will incur a hit to the API endpoint that lists all tags for 'foo/bar'.

I haven't looked at the client-side code (at least recently enough to remember), but I'd expect you'd need to do this for the new alias detection (docker/docker#8141). Unless you had a separate endpoint for "give me all aliases for $NAMESPACE/$REPO:$TAG". We could actually support something like that efficiently if we had something like refcount tracking (#606, #409), since I was recording the referring ([namespace, repository, tag], descendant_id) entry for each ancestor image id [1,2]. We'd just have to iterate through referrers and return tags matching $NAMESPACE/$REPO where descendant_id == image_id. With my new (proposed) atomic/streaming storage separation 3, accessing the referrers is probably a single hit to your atomic storage (to the image-references entry for the requested image).

dmp42 commented 9 years ago

One of the terribly inefficient thing right now is that we do not only ls but also read the (tag) file contents. That second part is going away.

Driver will still need to provide an efficient ls (and we can alleviate part of the pain by caching the result).

dmp42 commented 9 years ago

@bshi (and other gcs people?) - the new go drivers API #643 is going final and will be merged soon. The time is right to voice concerns :-)

bshi commented 9 years ago

Skimmed the discussion in #643 - it seems like you guys are aware and thinking about the issue of unbounded looping over driver interface methods. One other concern is the underlying storage consistency model and what the registry expects of the drivers. S3 doesn't even have consistent (:P) consistency models across S3 regions.

dmp42 commented 9 years ago

Consistency... we think about it, a lot :-) cc @stevvooe

stevvooe commented 9 years ago

@bshi Consistency and coordination are definitely something we are thinking about. Unfortunately, many of the storage backends lack consistency and don't have any kind of transactional coordination. The new registry will likely require some sort of coordination layer to mitigate that. Watch out for upcoming proposals, as you're input will be appreciated.

wking commented 9 years ago

On Thu, Nov 06, 2014 at 10:48:58AM -0800, Stephen Day wrote:

The new registry will likely require some sort of coordination layer to mitigate that.

I'd just keep the mutable (i.e. not content-addressable) stuff in a storage backend that does support transactions 1. Then you can offload the coordination to that storage engine (e.g. let Redis handle the transaction implementation and just use MULTI/EXEC/DISCARD). Then you get don't need to figure out a way to handle transaction locking between multiple registries that are sharing the same backing storage.

rlister commented 9 years ago

Just as a fun data-point, I started to run into this problem right around 1700 tags for a repo, backed with S3. We are doing continuous builds, so I am able to workaround by periodically deleting large repos and re-pushing as needed.

duyanghao commented 8 years ago

I have the similar problems @bshi @dmp42 .my storage backend is s3-ceph,So has this problem been solved yet? urgently!!!

i guess it is the problem of api "/v1/repositories/repo/tags", it is using python gevent to pull all tags file from storage backend,read it and return to the docker.it is using too much time! Maybe it exists someway to achieve that api with more efficiency,i am trying to do that!

docker-archive / docker-registry

Linear scan of a repository with lots (200+) of tags causes pull IO timeout #614