Closed helgi closed 8 years ago
https://ci.deis.io/job/workflow-test-pr/4500/console
00:09:52.057 Error: Unknown Error (503): {"detail":"test-653227208-run-0i1ak (run): Error while pulling image: Get http://localhost:5555/v1/repositories/test-653227208/images: dial tcp 127.0.0.1:5555: connection refused"}
According to https://ci.deis.io/job/workflow-test-pr/4353/artifact/4353/logs/deis-registry-proxy-uy0l8.log I see a lot of log lines such as the following:
2016/07/29 17:04:14 [warn] 9#9: *5350 a client request body is buffered to a temporary file /var/cache/nginx/client_temp/0000000200, client: 10.48.5.1, server: localhost, request: "PATCH /v2/test-946443332/blobs/uploads/6c9d03f8-fc37-4b4a-
Those only seem to be for blob uploads, not necessarily pulls. Nothing else points out to me that nginx is explicitly refusing the connection, however it could be an indicator that there are not enough worker processes that can handle the request.
closing due to #6, but please re-open if this persists.
https://ci.deis.io/job/workflow-test-pr/4533/console
00:08:29.796 Creating build... ...o...o...o...o...o...o...o...o...o...o...o...o...o...o...o...o...o...o...o...o...o...o...o...o...o...o...o...o...o...o...o...o...oError: Unknown Error (400): {"detail":"test-228168247-cmd (app::deploy): Error while pulling image: Get http://localhost:5555/v1/repositories/test-228168247/images: dial tcp 127.0.0.1:5555: connection refused"}
This seems to be a registry issue. From https://ci.deis.io/job/workflow-test-pr/4533/artifact/4533/logs/deis-describe.log
Events:
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
22s 22s 1 {default-scheduler } Normal Scheduled Successfully assigned deis-registry-3473008981-z93yw to gke-mumm-default-pool-d5c79aa9-25sw
20s 20s 1 {kubelet gke-mumm-default-pool-d5c79aa9-25sw} spec.containers{deis-registry} Normal Created Created container with docker id a2badd4f672f
19s 19s 1 {kubelet gke-mumm-default-pool-d5c79aa9-25sw} spec.containers{deis-registry} Normal Started Started container with docker id a2badd4f672f
21s 2s 2 {kubelet gke-mumm-default-pool-d5c79aa9-25sw} spec.containers{deis-registry} Normal Pulling pulling image "quay.io/deisci/registry:canary"
13s 2s 3 {kubelet gke-mumm-default-pool-d5c79aa9-25sw} spec.containers{deis-registry} Warning Unhealthy Liveness probe failed: Get http://10.12.2.4:5000/v2/: dial tcp 10.12.2.4:5000: connection refused
12s 2s 2 {kubelet gke-mumm-default-pool-d5c79aa9-25sw} spec.containers{deis-registry} Warning Unhealthy Readiness probe failed: Get http://10.12.2.4:5000/v2/: dial tcp 10.12.2.4:5000: connection refused
2s 2s 1 {kubelet gke-mumm-default-pool-d5c79aa9-25sw} spec.containers{deis-registry} Normal Killing Killing container with docker id a2badd4f672f: pod "deis-registry-3473008981-z93yw_deis(6c0b2f9b-58ed-11e6-99c2-42010a800098)" container "deis-registry" is unhealthy, it will be killed and re-created.
20s 0s 2 {kubelet gke-mumm-default-pool-d5c79aa9-25sw} spec.containers{deis-registry} Normal Pulled Successfully pulled image "quay.io/deisci/registry:canary"
Nothing obvious in the registry log? We can close this one I think
Nothing that I can see from parsing the logs, unfortunately. I think it's just the registry getting bogged down, though we should also do our own due diligence and check if there are any open issues at docker/distribution.
Also encountered during https://ci.deis.io/job/workflow-test-pr/4638/console
10:44:07 remote: {"errorDetail":{"message":"Post http://localhost:5555/v2/test-570398804/blobs/uploads/: dial tcp 127.0.0.1:5555: connection refused"},"error":"Post http://localhost:5555/v2/test-570398804/blobs/uploads/: dial tcp 127.0.0.1:5555: connection refused"}
We should probably open up an issue on deis/registry since the origination seems to be from there rather than here
I'm having a similar issue:
Pushing to registry
{"errorDetail":{"message":"Put http://localhost:5555/v1/repositories/api-staging/: dial tcp 127.0.0.1:5555: getsockopt: connection refused"},"error":"Put http://localhost:5555/v1/repositories/api-staging/: dial tcp 127.0.0.1:5555: getsockopt: cremote: n refused"}
The on-cluster registry and registry-proxy are both running ok(log outputs starting registry-proxy...
). I can access registry-proxy-pod-ip:80
but localhost:5555
cannot be accessed on the node. This is really strange since the proxy daemonset clearly sets HostPort: 5555
.
System info:
@blurrcat I see this issue for k8s: https://github.com/kubernetes/kubernetes/issues/34625
Perhaps there's some relevance there towards your issue?
The root issue here has long since been resolved (was end-to-end related, not general networking issues). Please re-open another ticket and we'll look into it. Thanks! :)
Happened on 2 tests:
https://ci.deis.io/job/workflow-test-pr/4353/artifact/4353/logs/deis-controller-2807243241-ssx08.log
https://ci.deis.io/job/workflow-test-pr/4353/
Going to run the suite again just to see how flakey that is