Closed DavidSie closed 7 years ago
From the pod list it looks like the registry-proxy component is missing, which is what proxies requests to the registry. Can you confirm with kubectl --namespace=deis get daemonsets
?
there are registry proxies. I attached one above but there are 3 the same( I using 1 master + 2 minions).
kubectl --namespace=deis get daemonsets
NAME DESIRED CURRENT NODE-SELECTOR AGE
deis-logger-fluentd 3 3 <none> 1d
deis-monitor-telegraf 3 3 <none> 1d
deis-registry-proxy 3 3 <none> 1d
Okay, so if you do indeed have registry proxies then you're probably hitting the same issue as https://github.com/deis/registry/issues/62, since your app relies on the ruby
image which is relatively large. I would take a look into that issue and see if you find similar behaviour.
According to docker hub https://hub.docker.com/r/library/ruby/tags/ it's only 313 MB, I would say that's average.
Are you sure that this address make sense: localhost:5555
, since deis-registry 10.3.0.188 <none> 80/TCP
and deis-registry-3758253254-3gtjo
pod is listening on port 5000
?
Yes, that address is correct. The request goes through the registry-proxy, which (as the name suggests) proxies the request to the real registry. It's a workaround for the --insecure-registry
flag. See https://github.com/deis/registry-proxy#about
Coming back to the original problem, I'd inspect both your registry and minio to ensure that there are no problems with either backend. From reports it seems like slightly larger than normal images built via Dockerfile (>100MB) seem to be causing these issues.
this is not a big container size issue (alpine is 2MB: https://hub.docker.com/r/library/alpine/tags/ ) :
git push deis master
Counting objects: 48, done.
Compressing objects: 100% (47/47), done.
Writing objects: 100% (48/48), 6.35 KiB, done.
Total 48 (delta 14), reused 0 (delta 0)
Starting build... but first, coffee!
...
Step 1 : FROM alpine
---> 4e38e38c8ce0
Step 2 : ENV GOPATH /go
---> Using cache
---> bd4d962b7a6e
Step 3 : ENV GOROOT /usr/local/go
---> Using cache
---> 346b304d9d9d
Step 4 : ENV PATH $PATH:/usr/local/go/bin:/go/bin
---> Using cache
---> bfd14db2b7e7
Step 5 : EXPOSE 80
---> Using cache
---> a019f2dadbcc
Step 6 : ENTRYPOINT while true; do echo hello world; sleep 1; done
---> Using cache
---> d500b7d348cb
Successfully built d500b7d348cb
Pushing to registry
{"errorDetail":{"message":"Put http://localhost:5555/v1/repositories/gaslit-gladness/: dial tcp 127.0.0.1:5555: getsockopt: connection refused"},"error":"Put http://localhost:5555/v1/repositories/gaslit-gladnessremote: tcp 127.0.0.1:5555: getsockopt: connection refused"}
remote: 2016/08/25 07:18:46 Error running git receive hook [Build pod exited with code 1, stopping build.]
To ssh://git@deis-builder.10.63.11.83.nip.io:2222/gaslit-gladness.git
! [remote rejected] master -> master (pre-receive hook declined)
error: failed to push some refs to 'ssh://git@deis-builder.10.63.11.83.nip.io:2222/gaslit-gladness.git'
Which container should listen on port 5555
?
(This is a different cluster but from the same script)
Which container should listen on port 5555 ?
The registry-proxy listens on port 5555.
Can you please provide the following information so we can try to reproduce this?
kubectl version
I recall that there is internal networking issues when using CoreOS with calico: https://github.com/deis/workflow/issues/442
from inside the container
root@deis-registry-proxy-jzf3h:/# telnet localhost 5555
Trying ::1...
Trying 127.0.0.1...
telnet: Unable to connect to remote host: Connection refused
root@deis-registry-proxy-jzf3h:/# netstat -lntpu
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:80 0.0.0.0:* LISTEN 1/nginx: master pro
kubectl version:
kubectl version
Client Version: version.Info{Major:"1", Minor:"3", GitVersion:"v1.3.4", GitCommit:"dd6b458ef8dbf24aff55795baa68f83383c9b3a9", GitTreeState:"clean", BuildDate:"2016-08-01T16:45:16Z", GoVersion:"go1.6.2", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"3", GitVersion:"v1.3.4+coreos.0", GitCommit:"be9bf3e842a90537e48361aded2872e389e902e7", GitTreeState:"clean", BuildDate:"2016-08-02T00:54:53Z", GoVersion:"go1.6.2", Compiler:"gc", Platform:"linux/amd64"}
to provision kubernetes cluster I used this tutorial: https://coreos.com/kubernetes/docs/latest/getting-started.html
The fun situation about not being able to connect to localhost:5555 within the container is to be expected. We actually mount the host's docker socket, so any command we perform it assumes the host's network. Therefore, localhost:5555 on the host belongs to registry-proxy.
When you provisioned kubernetes, where did you deploy your cluster? AWS, GKE, Vagrant?
I did it on Openstack
We had exact same problem, with:
coreos-kubernetes (from github repo #1876aac with kubernetes 1.3.4)
deis 2.4.0
vagrant 1.8.5
To create our Kubernetes cluster we followed the tutorial here: https://coreos.com/kubernetes/docs/latest/kubernetes-on-vagrant.html
After quite a bit of struggling (turning off calico, changing hostPort from 5555 to 80, etc. - nothing changed) we resolved using the plain version of Kubernetes, from the main Deis tutorial here: https://deis.com/docs/workflow/quickstart/provider/vagrant/boot/
with the notable change of Vagrant version, downgrading to 1.8.3, since the 1.8.5 has this bug: https://github.com/mitchellh/vagrant/issues/5186 (it's marked as closed but there's a regression in 1.8.5).
So, for us, the problem was in the CoreOs package. We haven't tried the very last commit though.
EDIT: we also tried the last commit from the CoreOs repository (commit #bdfe006) with Deis 2.4.1, nothing changed.
@think01 So you think that kubelet-wrapper
provided with CoreOS may be a cause of this problem, right ?
@DavidSie well, I cannot say the problem is in that component, but we solved by avoiding to use the coreos-kubernetes package and going plain with kubernetes on vagrant (that creates some fedora bosex).
Why you talk about kubelet-wrapper
?
Because I saw that CoreOS is shipped with this script /usr/lib/coreos/kubelet-wrapper
but there I see it only starts hyperkube on rkt.
ping @DavidSie, were you able to identify the root cause of your issue here?
I am experiencing what I think is a similar issue. My image is 385.9M (so it's >100M as mentioned by @bacongobbler). Regarding "inspecting" the backend - I cannot figure out how to get helpful logging out of the minio pod. I've tried the --debug
switch in various permutations, then found https://github.com/minio/minio/pull/820 which seems to indicate that it's no longer valid because it's not needed. I've tried setting MINIO_TRACE=1
per some code fragments I found. However, the kubectl --namespace logs deis-minio-123xyz
only ever shows what I assume is the minio startup stuff - there's no debug log, no trace log, nothing to indicate the behavior of minio during operation.
The first time: deis pull
2016-09-21 08:28:43
rbellamy@eanna i ~/Development/Terradatum/aergo/aergo-server feature/docker % deis pull 192.168.57.10:5000/aergo-server:1.0.0-SNAPSHOT -a aergo-server
Creating build... Error: Unknown Error (400): {"detail":"dial tcp 10.11.28.91:9000: i/o timeout"}
zsh: exit 1 deis pull 192.168.57.10:5000/aergo-server:1.0.0-SNAPSHOT -a aergo-server
controller logs
INFO [aergo-server]: build aergo-server-11b3c2a created
INFO [aergo-server]: rbellamy deployed 192.168.57.10:5000/aergo-server:1.0.0-SNAPSHOT
INFO Pulling Docker image 192.168.57.10:5000/aergo-server:1.0.0-SNAPSHOT
INFO Tagging Docker image 192.168.57.10:5000/aergo-server:1.0.0-SNAPSHOT as localhost:5555/aergo-server:v2
INFO Pushing Docker image localhost:5555/aergo-server:v2
INFO Pushing Docker image localhost:5555/aergo-server:v2
INFO Pushing Docker image localhost:5555/aergo-server:v2
INFO [aergo-server]: dial tcp 10.11.28.91:9000: i/o timeout
ERROR:root:dial tcp 10.11.28.91:9000: i/o timeout
Traceback (most recent call last):
File "/app/api/models/release.py", line 88, in new
release.publish()
File "/app/api/models/release.py", line 135, in publish
publish_release(source_image, self.image, deis_registry, self.get_registry_auth())
File "/app/registry/dockerclient.py", line 199, in publish_release
return DockerClient().publish_release(source, target, deis_registry, creds)
File "/app/registry/dockerclient.py", line 117, in publish_release
self.push("{}/{}".format(self.registry, name), tag)
File "/usr/local/lib/python3.5/dist-packages/backoff.py", line 286, in retry
ret = target(*args, **kwargs)
File "/app/registry/dockerclient.py", line 135, in push
log_output(stream, 'push', repo, tag)
File "/app/registry/dockerclient.py", line 178, in log_output
stream_error(chunk, operation, repo, tag)
File "/app/registry/dockerclient.py", line 195, in stream_error
raise RegistryException(message)
registry.dockerclient.RegistryException: dial tcp 10.11.28.91:9000: i/o timeout
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/app/api/models/build.py", line 62, in create
source_version=self.version
File "/app/api/models/release.py", line 95, in new
raise DeisException(str(e)) from e
api.exceptions.DeisException: dial tcp 10.11.28.91:9000: i/o timeout
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/rest_framework/views.py", line 471, in dispatch
response = handler(request, *args, **kwargs)
File "/app/api/views.py", line 181, in create
return super(AppResourceViewSet, self).create(request, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/rest_framework/mixins.py", line 21, in create
self.perform_create(serializer)
File "/app/api/viewsets.py", line 21, in perform_create
self.post_save(obj)
File "/app/api/views.py", line 258, in post_save
self.release = build.create(self.request.user)
File "/app/api/models/build.py", line 71, in create
raise DeisException(str(e)) from e
api.exceptions.DeisException: dial tcp 10.11.28.91:9000: i/o timeout
10.10.2.8 "POST /v2/apps/aergo-server/builds/ HTTP/1.1" 400 51 "Deis Client v2.5.1"
Then immediately, I try again: deis pull
2016-09-21 08:42:27
rbellamy@eanna i ~/Development/Terradatum/aergo/aergo-server feature/docker % deis pull 192.168.57.10:5000/aergo-server:1.0.0-SNAPSHOT -a aergo-server
Creating build... Error: Unknown Error (502): <html>
<head><title>502 Bad Gateway</title></head>
<body bgcolor="white">
<center><h1>502 Bad Gateway</h1></center>
<hr><center>nginx/1.11.2</center>
</body>
</html>
zsh: exit 1 deis pull 192.168.57.10:5000/aergo-server:1.0.0-SNAPSHOT -a aergo-server
controller logs
INFO [aergo-server]: build aergo-server-c09bb9b created
INFO [aergo-server]: rbellamy deployed 192.168.57.10:5000/aergo-server:1.0.0-SNAPSHOT
INFO Pulling Docker image 192.168.57.10:5000/aergo-server:1.0.0-SNAPSHOT
INFO Tagging Docker image 192.168.57.10:5000/aergo-server:1.0.0-SNAPSHOT as localhost:5555/aergo-server:v4
INFO Pushing Docker image localhost:5555/aergo-server:v4
INFO Pushing Docker image localhost:5555/aergo-server:v4
10.10.2.8 "GET /v2/apps/aergo-server/logs HTTP/1.1" 200 1284 "Deis Client v2.5.1"
INFO Pushing Docker image localhost:5555/aergo-server:v4
[2016-09-21 16:05:50 +0000] [24] [CRITICAL] WORKER TIMEOUT (pid:37)
[2016-09-21 16:05:50 +0000] [37] [WARNING] worker aborted
File "/usr/local/bin/gunicorn", line 11, in <module>
sys.exit(run())
File "/usr/local/lib/python3.5/dist-packages/gunicorn/app/wsgiapp.py", line 74, in run
WSGIApplication("%(prog)s [OPTIONS] [APP_MODULE]").run()
File "/usr/local/lib/python3.5/dist-packages/gunicorn/app/base.py", line 192, in run
super(Application, self).run()
File "/usr/local/lib/python3.5/dist-packages/gunicorn/app/base.py", line 72, in run
Arbiter(self).run()
File "/usr/local/lib/python3.5/dist-packages/gunicorn/arbiter.py", line 189, in run
self.manage_workers()
File "/usr/local/lib/python3.5/dist-packages/gunicorn/arbiter.py", line 524, in manage_workers
self.spawn_workers()
File "/usr/local/lib/python3.5/dist-packages/gunicorn/arbiter.py", line 590, in spawn_workers
self.spawn_worker()
File "/usr/local/lib/python3.5/dist-packages/gunicorn/arbiter.py", line 557, in spawn_worker
worker.init_process()
File "/usr/local/lib/python3.5/dist-packages/gunicorn/workers/base.py", line 132, in init_process
self.run()
File "/usr/local/lib/python3.5/dist-packages/gunicorn/workers/sync.py", line 124, in run
self.run_for_one(timeout)
File "/usr/local/lib/python3.5/dist-packages/gunicorn/workers/sync.py", line 68, in run_for_one
self.accept(listener)
File "/usr/local/lib/python3.5/dist-packages/gunicorn/workers/sync.py", line 30, in accept
self.handle(listener, client, addr)
File "/usr/local/lib/python3.5/dist-packages/gunicorn/workers/sync.py", line 135, in handle
self.handle_request(listener, req, client, addr)
File "/usr/local/lib/python3.5/dist-packages/gunicorn/workers/sync.py", line 176, in handle_request
respiter = self.wsgi(environ, resp.start_response)
File "/usr/local/lib/python3.5/dist-packages/django/core/handlers/wsgi.py", line 170, in __call__
response = self.get_response(request)
File "/usr/local/lib/python3.5/dist-packages/django/core/handlers/base.py", line 124, in get_response
response = self._middleware_chain(request)
File "/usr/local/lib/python3.5/dist-packages/django/core/handlers/exception.py", line 39, in inner
response = get_response(request)
File "/usr/local/lib/python3.5/dist-packages/django/utils/deprecation.py", line 133, in __call__
response = self.get_response(request)
File "/usr/local/lib/python3.5/dist-packages/django/core/handlers/exception.py", line 39, in inner
response = get_response(request)
File "/usr/local/lib/python3.5/dist-packages/django/utils/deprecation.py", line 133, in __call__
response = self.get_response(request)
File "/usr/local/lib/python3.5/dist-packages/django/core/handlers/exception.py", line 39, in inner
response = get_response(request)
File "/usr/local/lib/python3.5/dist-packages/django/utils/deprecation.py", line 133, in __call__
response = self.get_response(request)
File "/usr/local/lib/python3.5/dist-packages/django/core/handlers/exception.py", line 39, in inner
response = get_response(request)
File "/usr/local/lib/python3.5/dist-packages/django/utils/deprecation.py", line 133, in __call__
response = self.get_response(request)
File "/usr/local/lib/python3.5/dist-packages/django/core/handlers/exception.py", line 39, in inner
response = get_response(request)
File "/usr/local/lib/python3.5/dist-packages/django/utils/deprecation.py", line 133, in __call__
response = self.get_response(request)
File "/usr/local/lib/python3.5/dist-packages/django/core/handlers/exception.py", line 39, in inner
response = get_response(request)
File "/usr/local/lib/python3.5/dist-packages/django/utils/deprecation.py", line 133, in __call__
response = self.get_response(request)
File "/usr/local/lib/python3.5/dist-packages/django/core/handlers/exception.py", line 39, in inner
response = get_response(request)
File "/usr/local/lib/python3.5/dist-packages/django/utils/deprecation.py", line 133, in __call__
response = self.get_response(request)
File "/usr/local/lib/python3.5/dist-packages/django/core/handlers/exception.py", line 39, in inner
response = get_response(request)
File "/usr/local/lib/python3.5/dist-packages/django/utils/deprecation.py", line 133, in __call__
response = self.get_response(request)
File "/usr/local/lib/python3.5/dist-packages/django/core/handlers/exception.py", line 39, in inner
response = get_response(request)
File "/app/api/middleware.py", line 22, in __call__
response = self.get_response(request)
File "/usr/local/lib/python3.5/dist-packages/django/core/handlers/exception.py", line 39, in inner
response = get_response(request)
File "/usr/local/lib/python3.5/dist-packages/django/core/handlers/base.py", line 185, in _get_response
response = wrapped_callback(request, *callback_args, **callback_kwargs)
File "/usr/local/lib/python3.5/dist-packages/django/views/decorators/csrf.py", line 58, in wrapped_view
return view_func(*args, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/rest_framework/viewsets.py", line 87, in view
return self.dispatch(request, *args, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/rest_framework/views.py", line 471, in dispatch
response = handler(request, *args, **kwargs)
File "/app/api/views.py", line 181, in create
return super(AppResourceViewSet, self).create(request, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/rest_framework/mixins.py", line 21, in create
self.perform_create(serializer)
File "/app/api/viewsets.py", line 21, in perform_create
self.post_save(obj)
File "/app/api/views.py", line 258, in post_save
self.release = build.create(self.request.user)
File "/app/api/models/build.py", line 62, in create
source_version=self.version
File "/app/api/models/release.py", line 88, in new
release.publish()
File "/app/api/models/release.py", line 135, in publish
publish_release(source_image, self.image, deis_registry, self.get_registry_auth())
File "/app/registry/dockerclient.py", line 199, in publish_release
return DockerClient().publish_release(source, target, deis_registry, creds)
File "/app/registry/dockerclient.py", line 117, in publish_release
self.push("{}/{}".format(self.registry, name), tag)
File "/usr/local/lib/python3.5/dist-packages/backoff.py", line 286, in retry
ret = target(*args, **kwargs)
File "/app/registry/dockerclient.py", line 135, in push
log_output(stream, 'push', repo, tag)
File "/app/registry/dockerclient.py", line 175, in log_output
for chunk in stream:
File "/usr/local/lib/python3.5/dist-packages/docker/client.py", line 245, in _stream_helper
data = reader.read(1)
File "/usr/local/lib/python3.5/dist-packages/requests/packages/urllib3/response.py", line 314, in read
data = self._fp.read(amt)
File "/usr/lib/python3.5/http/client.py", line 448, in read
n = self.readinto(b)
File "/usr/lib/python3.5/http/client.py", line 478, in readinto
return self._readinto_chunked(b)
File "/usr/lib/python3.5/http/client.py", line 573, in _readinto_chunked
chunk_left = self._get_chunk_left()
File "/usr/lib/python3.5/http/client.py", line 541, in _get_chunk_left
chunk_left = self._read_next_chunk_size()
File "/usr/lib/python3.5/http/client.py", line 501, in _read_next_chunk_size
line = self.fp.readline(_MAXLINE + 1)
File "/usr/lib/python3.5/socket.py", line 575, in readinto
return self._sock.recv_into(b)
File "/usr/local/lib/python3.5/dist-packages/gunicorn/workers/base.py", line 191, in handle_abort
self.cfg.worker_abort(self)
File "/app/deis/gunicorn/config.py", line 36, in worker_abort
traceback.print_stack()
@rbellamy can you post registry logs in a gist? That will likely give us more information why the registry is failing to communicate with minio.
@bacongobbler will do.
Also, may be related to https://github.com/minio/minio/issues/2743.
Here's my setup, using Alpha channel of CoreOS and libvirt:
export KUBERNETES_PROVIDER=libvirt-coreos && export NUM_NODES=4
./cluster/kube-up.sh
# wait for etcd to settle
helmc install workflow-v2.5.0
# wait for kubernetes cluster to all be ready
deis pull 192.168.57.10:5000/aergo-server:1.0.0-SNAPSHOT -a aergo-server
Worked with @harshavardhana from the minio crew to try to troubleshoot this.
For whatever reason, during our teleconsole session, I was able to successfully push the image into the deis-registry-proxy
- but then saw the same dial i/o timeout
but in a different context. This time, it was while pulling the image from the proxy, during the image app:deploy
phase.
NOTE: you can ignore the 404 below - v4 of the aergo-server
doesn't exist since I've restarted the minio pod several times during troubleshooting. The v5 release is definitely stored in minio, as can be seen in the mc ls
command at the bottom of this post.
INFO [aergo-server]: build aergo-server-49c7405 created
INFO [aergo-server]: rbellamy deployed 192.168.57.10:5000/aergo-server:1.0.0-SNAPSHOT
INFO Pulling Docker image 192.168.57.10:5000/aergo-server:1.0.0-SNAPSHOT
INFO Tagging Docker image 192.168.57.10:5000/aergo-server:1.0.0-SNAPSHOT as localhost:5555/aergo-server:v5
INFO Pushing Docker image localhost:5555/aergo-server:v5
INFO Pulling Docker image localhost:5555/aergo-server:v5
INFO [aergo-server]: adding 5s on to the original 120s timeout to account for the initial delay specified in the liveness / readiness probe
INFO [aergo-server]: This deployments overall timeout is 125s - batch timout is 125s and there are 1 batches to deploy with a total of 1 pods
INFO [aergo-server]: waited 10s and 1 pods are in service
INFO [aergo-server]: waited 20s and 1 pods are in service
INFO [aergo-server]: waited 30s and 1 pods are in service
INFO [aergo-server]: waited 40s and 1 pods are in service
ERROR [aergo-server]: There was a problem deploying v5. Rolling back process types to release v4.
INFO Pulling Docker image localhost:5555/aergo-server:v4
INFO Pulling Docker image localhost:5555/aergo-server:v4
INFO Pulling Docker image localhost:5555/aergo-server:v4
INFO Pulling Docker image localhost:5555/aergo-server:v4
INFO Pulling Docker image localhost:5555/aergo-server:v4
INFO Pulling Docker image localhost:5555/aergo-server:v4
INFO Pulling Docker image localhost:5555/aergo-server:v4
INFO Pulling Docker image localhost:5555/aergo-server:v4
INFO Pulling Docker image localhost:5555/aergo-server:v4
ERROR [aergo-server]: (app::deploy): image aergo-server:v4 not found
ERROR:root:(app::deploy): image aergo-server:v4 not found
Traceback (most recent call last):
File "/app/scheduler/__init__.py", line 168, in deploy
deployment = self.deployment.get(namespace, name).json()
File "/app/scheduler/resources/deployment.py", line 29, in get
raise KubeHTTPException(response, message, *args)
scheduler.exceptions.KubeHTTPException: ('failed to get Deployment "aergo-server-cmd" in Namespace "aergo-server": 404 Not Found', 'aergo-server-cmd', 'aergo-server')
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/app/api/models/app.py", line 578, in deploy
async_run(tasks)
File "/app/api/utils.py", line 169, in async_run
raise error
File "/usr/lib/python3.5/asyncio/tasks.py", line 241, in _step
result = coro.throw(exc)
File "/app/api/utils.py", line 182, in async_task
yield from loop.run_in_executor(None, params)
File "/usr/lib/python3.5/asyncio/futures.py", line 361, in __iter__
yield self # This tells Task to wait for completion.
File "/usr/lib/python3.5/asyncio/tasks.py", line 296, in _wakeup
future.result()
File "/usr/lib/python3.5/asyncio/futures.py", line 274, in result
raise self._exception
File "/usr/lib/python3.5/concurrent/futures/thread.py", line 55, in run
result = self.fn(*self.args, **self.kwargs)
File "/app/scheduler/__init__.py", line 175, in deploy
namespace, name, image, entrypoint, command, **kwargs
File "/app/scheduler/resources/deployment.py", line 123, in create
self.wait_until_ready(namespace, name, **kwargs)
File "/app/scheduler/resources/deployment.py", line 338, in wait_until_ready
additional_timeout = self.pod._handle_pending_pods(namespace, labels)
File "/app/scheduler/resources/pod.py", line 552, in _handle_pending_pods
self._handle_pod_errors(pod, reason, message)
File "/app/scheduler/resources/pod.py", line 491, in _handle_pod_errors
raise KubeException(message)
scheduler.exceptions.KubeException: error pulling image configuration: Get http://10.11.28.91:9000/registry/docker/registry/v2/blobs/sha256/59/5905a7c362fbff9626d517a6ba0d8930fba34a321ba4c7bb718144d80cfaf29b/data?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=8TZRY2JRWMPT6UMXR6I5%2F20160921%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20160921T194800Z&X-Amz-Expires=1200&X-Amz-SignedHeaders=host&X-Amz-Signature=314c92bb84dbd4dd41f9bc572e625201a32ce300394d34e8516a57382fd2ec52: dial tcp 10.11.28.91:9000: i/o timeout
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/app/api/models/release.py", line 168, in get_port
port = docker_get_port(self.image, deis_registry, creds)
File "/app/registry/dockerclient.py", line 203, in get_port
return DockerClient().get_port(target, deis_registry, creds)
File "/app/registry/dockerclient.py", line 79, in get_port
info = self.inspect_image(target)
File "/usr/local/lib/python3.5/dist-packages/backoff.py", line 286, in retry
ret = target(*args, **kwargs)
File "/app/registry/dockerclient.py", line 156, in inspect_image
self.pull(repo, tag=tag)
File "/usr/local/lib/python3.5/dist-packages/backoff.py", line 286, in retry
ret = target(*args, **kwargs)
File "/app/registry/dockerclient.py", line 128, in pull
log_output(stream, 'pull', repo, tag)
File "/app/registry/dockerclient.py", line 178, in log_output
stream_error(chunk, operation, repo, tag)
File "/app/registry/dockerclient.py", line 195, in stream_error
raise RegistryException(message)
registry.dockerclient.RegistryException: image aergo-server:v4 not found
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/app/api/models/app.py", line 585, in deploy
self.deploy(release.previous(), force_deploy=True, rollback_on_failure=False)
File "/app/api/models/app.py", line 526, in deploy
port = release.get_port()
File "/app/api/models/release.py", line 176, in get_port
raise DeisException(str(e)) from e
api.exceptions.DeisException: image aergo-server:v4 not found
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/app/api/models/build.py", line 64, in create
self.app.deploy(new_release)
File "/app/api/models/app.py", line 595, in deploy
raise ServiceUnavailable(err) from e
api.exceptions.ServiceUnavailable: (app::deploy): image aergo-server:v4 not found
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/rest_framework/views.py", line 471, in dispatch
response = handler(request, *args, **kwargs)
File "/app/api/views.py", line 181, in create
return super(AppResourceViewSet, self).create(request, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/rest_framework/mixins.py", line 21, in create
self.perform_create(serializer)
File "/app/api/viewsets.py", line 21, in perform_create
self.post_save(obj)
File "/app/api/views.py", line 258, in post_save
self.release = build.create(self.request.user)
File "/app/api/models/build.py", line 71, in create
raise DeisException(str(e)) from e
api.exceptions.DeisException: (app::deploy): image aergo-server:v4 not found
10.10.2.8 "POST /v2/apps/aergo-server/builds/ HTTP/1.1" 400 59 "Deis Client v2.5.1"
And as you can see, the minio store definitely contains the image, and the proxy can communicate with the minio backend:
root@deis-registry-proxy-ccf4u:~# mc ls myminio/registry -r
[2016-09-21 19:47:36 UTC] 1.5KiB docker/registry/v2/blobs/sha256/2f/2fc6d0a3ec447743456f6fe782622ede8095b662bb39cb10c50b2a795829e51f/data
[2016-09-21 19:46:45 UTC] 112B docker/registry/v2/blobs/sha256/53/5345ff73e9fcf7b6c7d2d7eca2b0338ab274560ff988b8f63e60f73dfe0297ec/data
[2016-09-21 19:47:36 UTC] 5.0KiB docker/registry/v2/blobs/sha256/59/5905a7c362fbff9626d517a6ba0d8930fba34a321ba4c7bb718144d80cfaf29b/data
[2016-09-21 19:46:45 UTC] 232B docker/registry/v2/blobs/sha256/a6/a696cba1f6e865421664a7bf9bf585bcfaa924d56b7d2a112a799e00a7433791/data
[2016-09-21 19:47:14 UTC] 94MiB docker/registry/v2/blobs/sha256/b4/b419440b08d223eabe64f26d5f8556ee8d3f4c0bcafb8dd64ec525cc4eea7f6e/data
[2016-09-21 19:47:19 UTC] 94MiB docker/registry/v2/blobs/sha256/c0/c0963e676944ab20c36e857c33d76a6ba2166aaa6a0d3961d6cf20fae965efd0/data
[2016-09-21 19:47:14 UTC] 47MiB docker/registry/v2/blobs/sha256/d0/d0f0d61cd0d229546b1e33b0c92036ad3f35b42dd2c9a945aeaf67f84684ce26/data
[2016-09-21 19:46:59 UTC] 2.2MiB docker/registry/v2/blobs/sha256/e1/e110a4a1794126ef308a49f2d65785af2f25538f06700721aad8283b81fdfa58/data
[2016-09-21 19:46:45 UTC] 71B docker/registry/v2/repositories/aergo-server/_layers/sha256/5345ff73e9fcf7b6c7d2d7eca2b0338ab274560ff988b8f63e60f73dfe0297ec/link
[2016-09-21 19:47:36 UTC] 71B docker/registry/v2/repositories/aergo-server/_layers/sha256/5905a7c362fbff9626d517a6ba0d8930fba34a321ba4c7bb718144d80cfaf29b/link
[2016-09-21 19:46:45 UTC] 71B docker/registry/v2/repositories/aergo-server/_layers/sha256/a696cba1f6e865421664a7bf9bf585bcfaa924d56b7d2a112a799e00a7433791/link
[2016-09-21 19:47:18 UTC] 71B docker/registry/v2/repositories/aergo-server/_layers/sha256/b419440b08d223eabe64f26d5f8556ee8d3f4c0bcafb8dd64ec525cc4eea7f6e/link
[2016-09-21 19:47:19 UTC] 71B docker/registry/v2/repositories/aergo-server/_layers/sha256/c0963e676944ab20c36e857c33d76a6ba2166aaa6a0d3961d6cf20fae965efd0/link
[2016-09-21 19:47:18 UTC] 71B docker/registry/v2/repositories/aergo-server/_layers/sha256/d0f0d61cd0d229546b1e33b0c92036ad3f35b42dd2c9a945aeaf67f84684ce26/link
[2016-09-21 19:46:59 UTC] 71B docker/registry/v2/repositories/aergo-server/_layers/sha256/e110a4a1794126ef308a49f2d65785af2f25538f06700721aad8283b81fdfa58/link
[2016-09-21 19:47:36 UTC] 71B docker/registry/v2/repositories/aergo-server/_manifests/revisions/sha256/2fc6d0a3ec447743456f6fe782622ede8095b662bb39cb10c50b2a795829e51f/link
[2016-09-21 19:47:36 UTC] 71B docker/registry/v2/repositories/aergo-server/_manifests/tags/v5/current/link
[2016-09-21 19:47:36 UTC] 71B docker/registry/v2/repositories/aergo-server/_manifests/tags/v5/index/sha256/2fc6d0a3ec447743456f6fe782622ede8095b662bb39cb10c50b2a795829e51f/link
@bacongobbler - if you have a setup locally we can work on this and see what is causing the problem. Do not have kubernetes setup locally. i/o timeout seems to be related to network problem between registry and minio server. Need to see if the server itself is not responding properly. Couldn't see it with mc
though.
So, from my registry log gist: https://gist.github.com/rbellamy/c0db447ed47c364ae396b5d0c9852a02#file-deis-issue-64-registry-proxy-logs-L1242
@harshavardhana unfortunately we do not have any clusters reproducing this issue locally nor can we reproduce it ourselves, other than for the calico networking issue.
@rbellamy if you can supply information about how you set up your cluster including your KUBERNETES _PROVIDER envvar when using kube-up.sh and what version of workflow we can try to reproduce there. As far as e2e is concerned we aren't seeing this issue in master or in recent releases. http://ci.deis.io
@bacongobbler I included that information in a comment in this issue: https://github.com/deis/registry/issues/64#issuecomment-248700404
Thank you! From what others have voiced earlier it sounds like this sounds related to a CoreOS issue as seen earlier in https://github.com/deis/registry/issues/64#issuecomment-243107833. I'd recommend trying a different provider first and see if that resolves your issue.
I'm not sure how diagnostic this is, given I'm testing within a single libvirt host - however it should be noted that the host is running 2 x 12 AMD Opteron CPUs on a Supermicro MB with 128G RAM and all SSDs, and each VM is provisioned with 4G and 2CPUs, so I find it hard to believe that the issue at hand is related to overloaded VM host or guest.
From what @bacongobbler has said, deis hasn't seen this in their e2e test runner on k8s. I'd be interested to know what the test matrix looks like WRT other providers/hosts.
Maybe this is a CoreOS-related problem? Given https://github.com/coreos/bugs/issues/1554 it doesn't seem outside the realm of possibility.
Kubernetes on CoreOS (using libvirt-coreos provider and ./kube-up.sh script)
master with 3 nodes
INFO [aergo-server]: build aergo-server-6972f5f created
INFO [aergo-server]: rbellamy deployed 192.168.57.10:5000/aergo-server:1.0.0-SNAPSHOT
INFO Pulling Docker image 192.168.57.10:5000/aergo-server:1.0.0-SNAPSHOT
INFO Tagging Docker image 192.168.57.10:5000/aergo-server:1.0.0-SNAPSHOT as localhost:5555/aergo-server:v2
INFO Pushing Docker image localhost:5555/aergo-server:v2
INFO Pushing Docker image localhost:5555/aergo-server:v2
INFO Pushing Docker image localhost:5555/aergo-server:v2
INFO [aergo-server]: Put http://localhost:5555/v1/repositories/aergo-server/: read tcp 127.0.0.1:49384->127.0.0.1:5555: read: connection reset by peer
ERROR:root:Put http://localhost:5555/v1/repositories/aergo-server/: read tcp 127.0.0.1:49384->127.0.0.1:5555: read: connection reset by peer
Traceback (most recent call last):
File "/app/api/models/release.py", line 88, in new
release.publish()
File "/app/api/models/release.py", line 135, in publish
publish_release(source_image, self.image, deis_registry, self.get_registry_auth())
File "/app/registry/dockerclient.py", line 199, in publish_release
return DockerClient().publish_release(source, target, deis_registry, creds)
File "/app/registry/dockerclient.py", line 117, in publish_release
self.push("{}/{}".format(self.registry, name), tag)
File "/usr/local/lib/python3.5/dist-packages/backoff.py", line 286, in retry
ret = target(*args, **kwargs)
File "/app/registry/dockerclient.py", line 135, in push
log_output(stream, 'push', repo, tag)
File "/app/registry/dockerclient.py", line 178, in log_output
stream_error(chunk, operation, repo, tag)
File "/app/registry/dockerclient.py", line 195, in stream_error
raise RegistryException(message)
registry.dockerclient.RegistryException: Put http://localhost:5555/v1/repositories/aergo-server/: read tcp 127.0.0.1:49384->127.0.0.1:5555: read: connection reset by peer
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/app/api/models/build.py", line 62, in create
source_version=self.version
File "/app/api/models/release.py", line 95, in new
raise DeisException(str(e)) from e
api.exceptions.DeisException: Put http://localhost:5555/v1/repositories/aergo-server/: read tcp 127.0.0.1:49384->127.0.0.1:5555: read: connection reset by peer
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/rest_framework/views.py", line 471, in dispatch
response = handler(request, *args, **kwargs)
File "/app/api/views.py", line 181, in create
return super(AppResourceViewSet, self).create(request, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/rest_framework/mixins.py", line 21, in create
self.perform_create(serializer)
File "/app/api/viewsets.py", line 21, in perform_create
self.post_save(obj)
File "/app/api/views.py", line 258, in post_save
self.release = build.create(self.request.user)
File "/app/api/models/build.py", line 71, in create
raise DeisException(str(e)) from e
api.exceptions.DeisException: Put http://localhost:5555/v1/repositories/aergo-server/: read tcp 127.0.0.1:49384->127.0.0.1:5555: read: connection reset by peer
10.10.1.5 "POST /v2/apps/aergo-server/builds/ HTTP/1.1" 400 142 "Deis Client v2.5.1"
Maybe this is a CoreOS-related problem? Given coreos/bugs#1554 it doesn't seem outside the realm of possibility.
Yes, I do believe this is a CoreOS related problem as I mentioned in my previous comment. If you can try provisioning a cluster with a different provider that can help narrow down the issue.
@bacongobbler I've used corectl and Kube-Solo with success.
@DavidSie after reading the logs just a little more closely, I realized that this seems to be that it looks like your docker daemon is trying to push to a v1 registry endpoint.
Put http://localhost:5555/v1/repositories/spree/: dial tcp 127.0.0.1:5555: getsockopt: connection refused"
Notice the v1
in there. Since this is directly related to dockerbuilder because buildpack deploys work fine for you, I wonder if it's due to the docker python library auto-detecting the client version: https://github.com/deis/dockerbuilder/blob/28c31d45a17a97473e83c451b0d2e743678620c0/rootfs/deploy.py#L106
@rbellamy can you please re-open a separate issue? Your issue doesn't look to be the same as it looks like the original error from your report is about minio:
error pulling image configuration: Get http://10.11.28.91:9000/registry/docker/registry/v2/blobs/sha256/59/5905a7c362fbff9626d517a6ba0d8930fba34a321ba4c7bb718144d80cfaf29b/data?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=8TZRY2JRWMPT6UMXR6I5%2F20160921%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20160921T194800Z&X-Amz-Expires=1200&X-Amz-SignedHeaders=host&X-Amz-Signature=314c92bb84dbd4dd41f9bc572e625201a32ce300394d34e8516a57382fd2ec52: dial tcp 10.11.28.91:9000: i/o timeout
error pulling image configuration: Get http://10.11.28.91:9000/registry/docker/registry/v2/blobs/sha256/59/5905a7c362fbff9626d517a6ba0d8930fba34a321ba4c7bb718144d80cfaf29b/data?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=8TZRY2JRWMPT6UMXR6I5%2F20160921%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20160921T194800Z&X-Amz-Expires=1200&X-Amz-SignedHeaders=host&X-Amz-Signature=314c92bb84dbd4dd41f9bc572e625201a32ce300394d34e8516a57382fd2ec52: dial tcp 10.11.28.91:9000: i/o timeout
Is this still the network issue we were talking about previously? @bacongobbler - let me know how can i help here.
Yes. @rbellamy believe he has nailed it down as a symptom of coreos/bugs#1554. Thank you for the offer, though!
@bacongobbler Do you know how can I fix this issue ? Simply update deis (now I use 2.3.0)
I'm not sure how this could be fixed, however using 2.5.0 would never hurt.
I ran into this exact problem when setting up using the CoreOS tool as well. It's too bad that CoreOS aws-cli has this problem b/c the CoreOS tool works really well with cloudformation, which makes teardown a snap after trying out deis. kube-up does not use cloudformation and leaves crap all over your AWS account after you're done with it.
@dblackdblack even after using ./cluster/kube-down.sh
? I've always found that script tears down all the AWS resources it created.
So after debugging with both @jdumars and @felixbuenemann, both clusters seem to be showing the same symptom. The problem? Requesting a hostPort on some providers - like Rancher and CoreOS - does not work. @kmala pointed me towards https://github.com/kubernetes/kubernetes/issues/23920 so it looks like we found our smoking gun.
And for anyone who wants to take a crack at trying a patch, they can run through the following instructions to patch workflow-v2.7.0, removing registry-proxy and making the controller and builder connect directly with the registry. This will require the old --insecure-registry flag to be enabled so the docker daemon can talk to the registry, but here's the commands and the patch on a fresh cluster that shows this symptom:
git clone https://github.com/deis/charts
cd charts
curl https://gist.githubusercontent.com/bacongobbler/0b5f2c4fe6f067ddb775d53d635cc74d/raw/992a95edb8430ebcddba526fb1c48d9d0fcc1166/remove-registry-proxy.patch | git apply -
kubectl delete namespace deis
# also delete any app namespaces so you have a fresh cluster
rm -rf ~/.helmc/workspace/charts/workflow-v2.7.0
cp -R workflow-v2.7.0 ~/.helmc/workspace/charts/
helmc generate workflow-v2.7.0
helmc install workflow-v2.7.0
Note that this will purge your cluster entirely of Workflow.
There is currently no workaround for this as far as I'm aware, but if users want to bring this issue to light they can try to contribute patches upstream to kubernetes! :)
In case anyone wants to patch workflow-dev
you can use this gist with @bacongobbler instruction above.
@zinuzoid the instructions above use that exact patch :)
EDIT: I missed the one line change you made in your patch and the fact it's for workflow-dev. Nice catch!
@bacongobbler plus one line in workflow-dev/tpl/storage.sh
for me to make it work :)
I'm going to close this issue as there is nothing we can do here to work around this issue in Workflow other than with the patch I provided. This is an upstream issue and patches should be applied upstream. Until then please feel free to run with the patch provided here for production deployments that rely on CNI networking. Thanks!
When applying the patch got this corrupt patch at line 6 message: mbr-31107:charts jwalters$ curl https://gist.githubusercontent.com/bacongobbler/0b5f2c4fe6f067ddb775d53d635cc74d/raw/32a86cc4ddfa0a7cb173b1184ac3e288dedb5a84/remove-registry-proxy.patch | git apply - % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 3080 100 3080 0 0 3557 0 --:--:-- --:--:-- --:--:-- 3556 fatal: corrupt patch at line 6
@jwalters-gpsw try again. I just fixed the patch.
curl https://gist.githubusercontent.com/bacongobbler/0b5f2c4fe6f067ddb775d53d635cc74d/raw/992a95edb8430ebcddba526fb1c48d9d0fcc1166/remove-registry-proxy.patch | git apply -
v2.8.0 patch:
curl https://gist.githubusercontent.com/bacongobbler/0b5f2c4fe6f067ddb775d53d635cc74d/raw/248a052dd0575419d5890abaedec3a7940f3ada6/remove-registry-proxy-v2.8.0.patch | git apply -
Thanks for the updated patch. I'm running coreos on AWS. Is there a way for me to restart the docker daemons with the insecure registry option? Or would I need to redeploy the cluster?
It's easier to re-deploy the cluster if you're just getting set up. Otherwise you'll have to manually SSH into each node, modify the daemon startup flags and reboot docker on every node.
Thanks. I will give that a try. Also thinking about doing a Deis upgrade to the same version per the upgrade instructions but setting the registry to an off-cluster registry.
Manually updated the worker nodes docker config and applied your changes and its working fine now.
Sorry for raising this old thread, but could you please explain how to apply this patch to the 2.9 which is deployed via helm and not helm classic?
When I build an up with buildpack it works, but when I want to build container I cannot upload it to the registry
I know that there are environmental variables to point this address:
but I don't understand why, since none of the pods, and none of the services is listening on 5555
services
pods