deis / registry-proxy

Deis Workflow Registry Proxy
https://deis.com
MIT License
6 stars 4 forks source link

Connection Refused on 2 e2e tests #4

Closed helgi closed 8 years ago

helgi commented 8 years ago

scheduler.KubeException: Error while pulling image: Get http://localhost:5555/v1/repositories/test-265842571/images: dial tcp 127.0.0.1:5555: connection refused

Happened on 2 tests:

https://ci.deis.io/job/workflow-test-pr/4353/artifact/4353/logs/deis-controller-2807243241-ssx08.log

https://ci.deis.io/job/workflow-test-pr/4353/

Going to run the suite again just to see how flakey that is

helgi commented 8 years ago

https://ci.deis.io/job/workflow-test-pr/4500/console

00:09:52.057 Error: Unknown Error (503): {"detail":"test-653227208-run-0i1ak (run): Error while pulling image: Get http://localhost:5555/v1/repositories/test-653227208/images: dial tcp 127.0.0.1:5555: connection refused"}

bacongobbler commented 8 years ago

According to https://ci.deis.io/job/workflow-test-pr/4353/artifact/4353/logs/deis-registry-proxy-uy0l8.log I see a lot of log lines such as the following:

2016/07/29 17:04:14 [warn] 9#9: *5350 a client request body is buffered to a temporary file /var/cache/nginx/client_temp/0000000200, client: 10.48.5.1, server: localhost, request: "PATCH /v2/test-946443332/blobs/uploads/6c9d03f8-fc37-4b4a-

Those only seem to be for blob uploads, not necessarily pulls. Nothing else points out to me that nginx is explicitly refusing the connection, however it could be an indicator that there are not enough worker processes that can handle the request.

bacongobbler commented 8 years ago

closing due to #6, but please re-open if this persists.

helgi commented 8 years ago

https://ci.deis.io/job/workflow-test-pr/4533/console

00:08:29.796 Creating build... ...o...o...o...o...o...o...o...o...o...o...o...o...o...o...o...o...o...o...o...o...o...o...o...o...o...o...o...o...o...o...o...o...oError: Unknown Error (400): {"detail":"test-228168247-cmd (app::deploy): Error while pulling image: Get http://localhost:5555/v1/repositories/test-228168247/images: dial tcp 127.0.0.1:5555: connection refused"}

bacongobbler commented 8 years ago

This seems to be a registry issue. From https://ci.deis.io/job/workflow-test-pr/4533/artifact/4533/logs/deis-describe.log

Events:
  FirstSeen LastSeen    Count   From                        SubobjectPath           Type        Reason      Message
  --------- --------    -----   ----                        -------------           --------    ------      -------
  22s       22s     1   {default-scheduler }                                Normal      Scheduled   Successfully assigned deis-registry-3473008981-z93yw to gke-mumm-default-pool-d5c79aa9-25sw
  20s       20s     1   {kubelet gke-mumm-default-pool-d5c79aa9-25sw}   spec.containers{deis-registry}  Normal      Created     Created container with docker id a2badd4f672f
  19s       19s     1   {kubelet gke-mumm-default-pool-d5c79aa9-25sw}   spec.containers{deis-registry}  Normal      Started     Started container with docker id a2badd4f672f
  21s       2s      2   {kubelet gke-mumm-default-pool-d5c79aa9-25sw}   spec.containers{deis-registry}  Normal      Pulling     pulling image "quay.io/deisci/registry:canary"
  13s       2s      3   {kubelet gke-mumm-default-pool-d5c79aa9-25sw}   spec.containers{deis-registry}  Warning     Unhealthy   Liveness probe failed: Get http://10.12.2.4:5000/v2/: dial tcp 10.12.2.4:5000: connection refused
  12s       2s      2   {kubelet gke-mumm-default-pool-d5c79aa9-25sw}   spec.containers{deis-registry}  Warning     Unhealthy   Readiness probe failed: Get http://10.12.2.4:5000/v2/: dial tcp 10.12.2.4:5000: connection refused
  2s        2s      1   {kubelet gke-mumm-default-pool-d5c79aa9-25sw}   spec.containers{deis-registry}  Normal      Killing     Killing container with docker id a2badd4f672f: pod "deis-registry-3473008981-z93yw_deis(6c0b2f9b-58ed-11e6-99c2-42010a800098)" container "deis-registry" is unhealthy, it will be killed and re-created.
  20s       0s      2   {kubelet gke-mumm-default-pool-d5c79aa9-25sw}   spec.containers{deis-registry}  Normal      Pulled      Successfully pulled image "quay.io/deisci/registry:canary"
helgi commented 8 years ago

Nothing obvious in the registry log? We can close this one I think

bacongobbler commented 8 years ago

Nothing that I can see from parsing the logs, unfortunately. I think it's just the registry getting bogged down, though we should also do our own due diligence and check if there are any open issues at docker/distribution.

vdice commented 8 years ago

Also encountered during https://ci.deis.io/job/workflow-test-pr/4638/console

10:44:07 remote: {"errorDetail":{"message":"Post http://localhost:5555/v2/test-570398804/blobs/uploads/: dial tcp 127.0.0.1:5555: connection refused"},"error":"Post http://localhost:5555/v2/test-570398804/blobs/uploads/: dial tcp 127.0.0.1:5555: connection refused"}

bacongobbler commented 8 years ago

We should probably open up an issue on deis/registry since the origination seems to be from there rather than here

blurrcat commented 8 years ago

I'm having a similar issue:

Pushing to registry
{"errorDetail":{"message":"Put http://localhost:5555/v1/repositories/api-staging/: dial tcp 127.0.0.1:5555: getsockopt: connection refused"},"error":"Put http://localhost:5555/v1/repositories/api-staging/: dial tcp 127.0.0.1:5555: getsockopt: cremote: n refused"}

The on-cluster registry and registry-proxy are both running ok(log outputs starting registry-proxy...). I can access registry-proxy-pod-ip:80 but localhost:5555 cannot be accessed on the node. This is really strange since the proxy daemonset clearly sets HostPort: 5555.

System info:

bacongobbler commented 8 years ago

@blurrcat I see this issue for k8s: https://github.com/kubernetes/kubernetes/issues/34625

Perhaps there's some relevance there towards your issue?

bacongobbler commented 8 years ago

The root issue here has long since been resolved (was end-to-end related, not general networking issues). Please re-open another ticket and we'll look into it. Thanks! :)