504 Errors on Push - Githubissues

wozz commented 9 years ago

I get this error on the client side when doing a push:

4b06ea88509c: Pushing 4b06ea88509c: Buffering to disk time="2015-01-14T20:46:38Z" level="fatal" msg="HTTP code 504 while uploading metadata: invalid character '<' looking for beginning of value"

I'm running docker-registry docker container behind NGINX.

Logs from the registry:

[2015-01-14 20:32:05 +0000] [53] [INFO] Autorestarting worker after current request. 172.17.42.1 - - [14/Jan/2015:20:32:06 +0000] "GET /v1/images/4b06ea88509c2377f94a9d0f4f260247dfa6b195e716fd3a14c1c52b17a01bf1/json HTTP/1.0" 404 28 "-" "docker/1.4.1 go/go1.3.3 git-commit/5bc2ff8 kernel/3.13.0-36-generic os/linux arch/amd64" [2015-01-14 20:32:06 +0000] [53] [INFO] Worker exiting (pid: 53) [2015-01-14 20:32:06 +0000] [61] [INFO] Booting worker with pid: 61 172.17.42.1 - - [14/Jan/2015:20:32:06 +0000] "PUT /v1/images/4b06ea88509c2377f94a9d0f4f260247dfa6b195e716fd3a14c1c52b17a01bf1/json HTTP/1.0" 200 4 "-" "docker/1.4.1 go/go1.3.3 git-commit/5bc2ff8 kernel/3.13.0-36-generic os/linux arch/amd64" [2015-01-14 20:32:09 +0000] [56] [INFO] Autorestarting worker after current request. 172.17.42.1 - - [14/Jan/2015:20:32:09 +0000] "PUT /v1/images/4b06ea88509c2377f94a9d0f4f260247dfa6b195e716fd3a14c1c52b17a01bf1/layer HTTP/1.0" 200 4 "-" "docker/1.4.1 go/go1.3.3 git-commit/5bc2ff8 kernel/3.13.0-36-generic os/linux arch/amd64" [2015-01-14 20:32:10 +0000] [56] [INFO] Worker exiting (pid: 56) [2015-01-14 20:32:10 +0000] [62] [INFO] Booting worker with pid: 62

These errors happen every 4-5 image pushes and then I restart the docker-registry. I've been having these issues frequently on stable and recent master releases.

dmp42 commented 9 years ago

504 indicates a gateaway timeout. Can you copy both your nginx and registry logs when this happens?

Thanks.

wozz commented 9 years ago

Registry logs are above, followed by this (2 line overlap):

[2015-01-14 20:32:10 +0000] [56] [INFO] Worker exiting (pid: 56) [2015-01-14 20:32:10 +0000] [62] [INFO] Booting worker with pid: 62 172.17.42.1 - - [14/Jan/2015:20:41:04 +0000] "GET /v1/_ping HTTP/1.0" 200 2 "-" "Go 1.1 package http" 172.17.42.1 - - [14/Jan/2015:20:41:04 +0000] "GET /v1/_ping HTTP/1.0" 200 2 "-" "Go 1.1 package http" 172.17.42.1 - - [14/Jan/2015:20:41:04 +0000] "GET /v1/users/ HTTP/1.0" 200 4 "-" "docker/1.4.1 go/go1.3.3 git-commit/5bc2ff8 kernel/3.13.0-36-generic os/linux arch/amd64" 172.17.42.1 - - [14/Jan/2015:20:41:04 +0000] "GET /v1/_ping HTTP/1.0" 200 2 "-" "Go 1.1 package http" 172.17.42.1 - - [14/Jan/2015:20:41:04 +0000] "GET /v1/_ping HTTP/1.0" 200 2 "-" "Go 1.1 package http" 172.17.42.1 - - [14/Jan/2015:20:56:13 +0000] "GET /v1/_ping HTTP/1.0" 200 2 "-" "Go 1.1 package http" 172.17.42.1 - - [14/Jan/2015:20:56:13 +0000] "GET /v1/_ping HTTP/1.0" 200 2 "-" "Go 1.1 package http"

Corresponding NGINX log is indeed a timeout:

2015/01/14 20:48:49 [error] 17048#0: *14004 upstream timed out (110: Connection timed out) while reading response header from upstream, client: [ip], server: registry.[domain].com, request: "PUT /v1/images/4b06ea88509c2377f94a9d0f4f260247dfa6b195e716fd3a14c1c52b17a01bf1/checksum HTTP/1.1", upstream: "http://127.0.0.1:5000/v1/images/4b06ea88509c2377f94a9d0f4f260247dfa6b195e716fd3a14c1c52b17a01bf1/checksum", host: "registry.[domain].com"

To me, this appears like the registry crashed and stopped responding. I can try turning on some higher level debug logs, although I don't know how to do that for docker-registry.

dmp42 commented 9 years ago

If there is no stacktrace in your registry log, there should be no crash.

I can think of a couple of issues:

timeout too short on nginx config
registry getting stuck writing to the backend

Can you:

configure DEBUG=true for the registry and restart it
copy the output of curl https://myregistry/_ping
copy your registry configuration
copy your nginx configuration

That would certainly help pinpointing your issue.

Thanks!

wozz commented 9 years ago

Here's my NGINX config

upstream dkrreg {
    server 127.0.0.1:5000 max_fails=0;
    keepalive 512;
}
server {
    listen 80;
    return 301 https://$host$request_uri;
}
server {
    listen 443 ssl;
    server_name domain;
    ssl_certificate     /etc/ssl/certs/domain.crt;
    ssl_certificate_key /etc/ssl/private/domain.key;
    ssl_session_cache shared:SSL:30m;
    ssl_session_timeout 30m;
    ssl_protocols SSLv3 TLSv1 TLSv1.1 TLSv1.2;
    ssl_prefer_server_ciphers on;
    ssl_ciphers ECDHE-RSA-AES128-SHA:AES256-GCM-SHA256:ECDHE-RSA-AES256-SHA256:HIGH:!aNULL:!MD5:-LOW:-SSLv2:-EXP;
    client_max_body_size 900M;
    location /_ping {
        auth_basic off;
        proxy_pass http://dkrreg;
    }
    location /v1/_ping {
        auth_basic off;
        proxy_pass http://dkrreg;
    }
    location / {
        auth_basic              "Restricted";
        auth_basic_user_file    docker-registry.htpasswd;
        proxy_pass http://dkrreg;
        proxy_set_header Host $host;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto https;
        real_ip_header X-Forwarded-For;
        real_ip_recursive on;
        proxy_read_timeout 1000;
    }
}

curl localhost:5000/_ping returns {}

This is how I run the docker container:

docker run
    -e SETTINGS_FLAVOR=s3
    -e AWS_BUCKET=bucketname
    -e AWS_REGION=us-east-1
    -e STORAGE_PATH=/registry
    -e SEARCH_BACKEND=
    -e DEBUG=true
    -e GUNICORN_OPTS='[--preload]'
    -p 5000:5000
    registry-master

(I get the same errors with/without search backend, but it's currently turned off, and DEBUG was just added after you suggested it)

wozz commented 9 years ago

Hmmm... maybe I need to increase proxy_read_timeout

dmp42 commented 9 years ago

I don't see either chunked_transfer_encoding (depends on your nginx version):

See:

trinitronx commented 9 years ago

Just to add a data point: After enabling the SEARCH_BACKEND and GUNICORN_OPTS='[--preload]' options to the docker registry, we ran into stability issues with docker/docker-registry#540.

After disabling both of these options, these stability issues went away and errors became much less frequent (although seemed to be present but still sometimes occurred)

anshuljoshi commented 9 years ago

I too was facing the same problem while pushing the image with Docker version 1.7.0, build 0baf609. Simply doing: "sudo docker push my/test" resolved it. So, I suppose "sudo" is needed because of some permission issue.

smiller171 commented 8 years ago

I just ran into this myself on updating my registry. it used to work, but now it's broken

trinitronx commented 8 years ago

sudo for docker CLI commands has always been needed unless your user is not in the docker group (or whichever group is configured to have rw access to the docker daemon unix socket). That behavior is unrelated to this bug.

smiller171 commented 8 years ago

@trinitronx working with sudo was just luck. I'm using registry v2, and after hours of fighting with it I determined only that there is some type of bug with the S3 storage backend that fails 90% of the time when doing a push. every once in a while it gets through, and it working when I used sudo was coincidence. unrelated as I'm using v2

dmp42 commented 8 years ago

@smiller171 Please report this to https://github.com/docker/distribution

trinitronx commented 8 years ago

@smiller171: Registry v2 lives in the docker/distribution project as @dmp42 has helpfully pointed out. This repo is the old / legacy v1 python-based Registry.

smiller171 commented 8 years ago

Thanks @dmp42 @trinitronx I stumbled here from Google trying to research my problem.

docker-archive / docker-registry

504 Errors on Push #900