docker-archive / docker-registry

This is **DEPRECATED**! Please go to https://github.com/docker/distribution
Apache License 2.0
2.88k stars 877 forks source link

Frequent EOF issues when pulling files from the registry #971

Open dsw88 opened 9 years ago

dsw88 commented 9 years ago

We're running the docker-registry internally in our organization. We are running v0.9.1 of the registry, using the provided container to run the registry. We have 6 EC2 hosts with one registry container per host.

We are using S3 storage for the backend store, and we have a Redis LRU cache configured to cache the smaller files. We have implemented the STORAGE_REDIRECT as suggested at https://github.com/docker/docker-registry/issues/540#issuecomment-83218793.

We use an AWS Elastic Load Balancer to distribute traffic to the various registry hosts. We are pointing the load balancer directly at the registry container. In other words, there's no reverse proxy like Nginx running on each of the hosts.

We frequently get EOF errors when pulling images. This used to happen all the time until we implemented STORAGE_REDIRECT, and now we don't see it when pulling image layers. However, we still see this periodically when pulling the smaller metadata files associated with the image layers such as the "ancestry" file.

Here's an example when we're trying to pull an image:

$ docker pull <host_name>/<image_name>:238   
Pulling repository <host_name>/<image_name>                      
2015/03/18 20:44:16 Error pulling image (<tag_name>) from <host_name>/<image_name>, 
Get http://<host_name>/v1/images/8a39dc87bd3e270444da2b7316ffcc8f7c2e65f5d91e5a3c3d2bcf17b905a7f6/ancestry: EOF                                                                                  

When I look in the registry logs, it shows that it got that request and even returned a 200, so presumably it thought it returned the image layer correctly:

[18/Mar/2015:20:44:16 +0000] "GET /v1/images/8a39dc87bd3e270444da2b7316ffcc8f7c2e65f5d91e5a3c3d2bcf17b905a7f6/ancestry HTTP/1.1" 200 3196 "-" "docker/1.1.2 go/go1.2.1 git-commit/d84a070 kernel/3.14.0-0.bpo.2-amd64 os/linux arch/amd64"

This issue is a pretty big deal for us, as it's breaking our automated deploy process when pulling from these registries. It ends up causing 20-40% deploy failures, which is really coloring my organization's view of Docker. I'd love to fix this ASAP so we can convince people around here to keep using Docker!

Any ideas what's causing this? I thought it might be a timeout on the load balancer but we've got it set to an hour for timeouts, so that shouldn't be affecting things, and this error seems to return with an EOF really fast after making the initial call.

dsw88 commented 9 years ago

Oh also on the other issue you suggested upgrading the version of Docker used to pull the image. I'll try that first.

sergeyevstifeev commented 9 years ago

Experiencing the same issue. @dsw88 Did docker version upgrade help?

dmp42 commented 9 years ago

Guys, I need your docker version and docker info. Additionally, running docker in debug mode and getting the docker logs usually helps. Also, the registry logs if possible.

dcharbonnier commented 9 years ago

same problem here :

#:~$ docker -v
Docker version 1.3.3, build d344625
#:~$ docker info
Containers: 1
Images: 15
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Dirs: 17
Execution Driver: native-0.2
Kernel Version: 3.16.0-4-amd64
Operating System: Debian GNU/Linux 8 (jessie)
WARNING: No memory limit support
WARNING: No swap limit support
#:~$ docker ps
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS                      NAMES
8b16ffd795e9        registry:latest     "docker-registry"   2 weeks ago         Up 2 weeks          127.0.0.1:5000->5000/tcp   docker-registry
dcharbonnier commented 9 years ago

Client :

#:~$ docker version
Client version: 1.3.3
Client API version: 1.15
Go version (client): go1.3.3
Git commit (client): d344625
OS/Arch (client): linux/amd64
Server version: 1.3.3
Server API version: 1.15
Go version (server): go1.3.3
Git commit (server): d344625
#:~$ docker info
Containers: 3
Images: 67
Storage Driver: aufs
 Root Dir: /data/docker/aufs
 Dirs: 73
Execution Driver: native-0.2
Kernel Version: 3.16.0-4-amd64
Operating System: Debian GNU/Linux 8 (jessie)
WARNING: No memory limit support
WARNING: No swap limit support
sergeyevstifeev commented 9 years ago

I'm using a cloudformation setup very similar to the following: https://github.com/mbabineau/cloudformation-docker-registry It ends up with the following docker version on docker registry (1.4.1):

$> docker version
sudo: unable to resolve host ip-10-225-15-169
Client version: 1.4.1
Client API version: 1.16
Go version (client): go1.3.3
Git commit (client): 5bc2ff8
OS/Arch (client): linux/amd64
Server version: 1.4.1
Server API version: 1.16
Go version (server): go1.3.3
Git commit (server): 5bc2ff8

And the clients have (1.5.0):

$> docker version
Client version: 1.5.0
Client API version: 1.17
Go version (client): go1.3.3
Git commit (client): a8a31ef/1.5.0
OS/Arch (client): linux/amd64
Server version: 1.5.0
Server API version: 1.17
Go version (server): go1.3.3
Git commit (server): a8a31ef/1.5.0
dsw88 commented 9 years ago

Here's the registry version of Docker:

Client version: 1.3.0
Client API version: 1.15
Go version (client): go1.3.3
Git commit (client): c78088f
OS/Arch (client): linux/amd64
Server version: 1.3.0
Server API version: 1.15
Go version (server): go1.3.3
Git commit (server): c78088f

Here's the docker info for the registry:

Containers: 1
Images: 15
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Dirs: 17
Execution Driver: native-0.2
Kernel Version: 3.16.0-0.bpo.4-amd64
Operating System: Debian GNU/Linux 7 (wheezy)
WARNING: No memory limit support
WARNING: No swap limit support

Here's the docker version for one of the docker daemons that had a failed pull:

Client version: 1.2.0
Client API version: 1.14
Go version (client): go1.2
Git commit (client): fa7b24f/1.2.0
OS/Arch (client): linux/amd64
Server version: 1.2.0
Server API version: 1.14
Go version (server): go1.2
Git commit (server): fa7b24f/1.2.0

And here's the docker info for that client:

Containers: 0
Images: 13
Storage Driver: devicemapper
 Pool Name: docker-202:1-263625-pool
 Pool Blocksize: 64 Kb
 Data file: /var/lib/docker/devicemapper/devicemapper/data
 Metadata file: /var/lib/docker/devicemapper/devicemapper/metadata
 Data Space Used: 1129.6 Mb
 Data Space Total: 102400.0 Mb
 Metadata Space Used: 1.5 Mb
 Metadata Space Total: 2048.0 Mb
Execution Driver: native-0.2
Kernel Version: 3.14.20-20.44.amzn1.x86_64
Operating System: Amazon Linux AMI 2014.09

This one is old because we run on Elastic Beanstalk and they don't yet have support for configuring insecure registries yet. This means with 1.3 and above we get failures because Docker complains about an insecure registry. It looks like others on this issue are having the same failures even with up-to-date versions of Docker.

dmp42 commented 9 years ago

Thanks a lot for the infos.

Everyone:

That will definitely help. (EOF on pull can be the result of a number of very different things).

Thanks!