deis / registry

Docker registry for Deis Workflow.
https://deis.com
MIT License
16 stars 24 forks source link

cannot upload docker container to registry #64

Closed DavidSie closed 7 years ago

DavidSie commented 8 years ago

When I build an up with buildpack it works, but when I want to build container I cannot upload it to the registry

 kubectl version
Client Version: version.Info{Major:"1", Minor:"3", GitVersion:"v1.3.4", GitCommit:"dd6b458ef8dbf24aff55795baa68f83383c9b3a9", GitTreeState:"clean", BuildDate:"2016-08-01T16:45:16Z", GoVersion:"go1.6.2", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"3", GitVersion:"v1.3.4+coreos.0", GitCommit:"be9bf3e842a90537e48361aded2872e389e902e7", GitTreeState:"clean", BuildDate:"2016-08-02T00:54:53Z", GoVersion:"go1.6.2", Compiler:"gc", Platform:"linux/amd64"}
deis version
v2.4.0
     git push deis master
Counting objects: 589, done.
Compressing objects: 100% (416/416), done.
Writing objects: 100% (589/589), 2.47 MiB, done.
Total 589 (delta 46), reused 581 (delta 42)
Starting build... but first, coffee!
Step 1 : FROM ruby:2.0.0-p576
---> a137b6df82e8
Step 2 : COPY . /app
---> Using cache
---> a7107ea0f79a
Step 3 : WORKDIR /app
---> Using cache
---> ba2d0c3222ec
Step 4 : EXPOSE 3000
---> Using cache
---> 18f7fb188ed3
Step 5 : CMD while true; do echo hello world; sleep 1; done
---> Using cache
---> 4e22b0487484
Successfully built 4e22b0487484
Pushing to registry
{"errorDetail":{"message":"Put http://localhost:5555/v1/repositories/spree/: dial tcp 127.0.0.1:5555: getsockopt: connection refused"},"error":"Put http://localhost:5555/v1/repositories/spree/: dial tcp 127.0.0.remote: getsockopt: connection refused"}

I know that there are environmental variables to point this address:

     Environment Variables:
      DEIS_REGISTRY_SERVICE_HOST:   localhost
      DEIS_REGISTRY_SERVICE_PORT:   5555

but I don't understand why, since none of the pods, and none of the services is listening on 5555

services

kubectl get services --namespace=deis
NAME                     CLUSTER-IP   EXTERNAL-IP   PORT(S)                            AGE
deis-builder             10.3.0.233   <none>        2222/TCP                           1d
deis-controller          10.3.0.23    <none>        80/TCP                             1d
deis-database            10.3.0.253   <none>        5432/TCP                           1d
deis-logger              10.3.0.221   <none>        80/TCP                             1d
deis-logger-redis        10.3.0.148   <none>        6379/TCP                           1d
deis-minio               10.3.0.232   <none>        9000/TCP                           1d
deis-monitor-grafana     10.3.0.113   <none>        80/TCP                             1d
deis-monitor-influxapi   10.3.0.234   <none>        80/TCP                             1d
deis-monitor-influxui    10.3.0.141   <none>        80/TCP                             1d
deis-nsqd                10.3.0.82    <none>        4151/TCP,4150/TCP                  1d
deis-registry            10.3.0.188   <none>        80/TCP                             1d
deis-router              10.3.0.133   <pending>     80/TCP,443/TCP,2222/TCP,9090/TCP   1d
deis-workflow-manager    10.3.0.34    <none>        80/TCP                             1d

pods


kubectl describe  pods deis-registry-3758253254-3gtjo   --namespace=deis 
Name:       deis-registry-3758253254-3gtjo
Namespace:  deis
Node:       10.63.11.75/10.63.11.75
Start Time: Mon, 22 Aug 2016 10:36:12 +0000
Labels:     app=deis-registry
        pod-template-hash=3758253254
Status:     Running
IP:     10.2.12.12
Controllers:    ReplicaSet/deis-registry-3758253254
Containers:
  deis-registry:
    Container ID:   docker://78d6d569eefac3766e4b921f21b7847d36866a266ae76424d7d6e572bb2f5979
    Image:      quay.io/deis/registry:v2.2.0
    Image ID:       docker://sha256:0eb83b180d1aa993fcdd715e4b919b4867051d4f35a813a56eec04ae0705d3d1
    Port:       5000/TCP
    State:      Running
      Started:      Mon, 22 Aug 2016 10:43:05 +0000
    Ready:      True
    Restart Count:  0
    Liveness:       http-get http://:5000/v2/ delay=1s timeout=1s period=10s #success=1 #failure=3
    Readiness:      http-get http://:5000/v2/ delay=1s timeout=1s period=10s #success=1 #failure=3
    Environment Variables:
      REGISTRY_STORAGE_DELETE_ENABLED:  true
      REGISTRY_LOG_LEVEL:       info
      REGISTRY_STORAGE:         minio
Conditions:
  Type      Status
  Initialized   True 
  Ready     True 
  PodScheduled  True 
Volumes:
  registry-storage:
    Type:   EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium: 
  registry-creds:
    Type:   Secret (a volume populated by a Secret)
    SecretName: objectstorage-keyfile
  deis-registry-token-inyyj:
    Type:   Secret (a volume populated by a Secret)
    SecretName: deis-registry-token-inyyj
QoS Tier:   BestEffort
No events.

kubectl describe  pods deis-registry-proxy-cpu68    --namespace=deis 
Name:       deis-registry-proxy-cpu68
Namespace:  deis
Node:       10.63.11.76/10.63.11.76
Start Time: Mon, 22 Aug 2016 10:36:31 +0000
Labels:     app=deis-registry-proxy
        heritage=deis
Status:     Running
IP:     10.2.63.4
Controllers:    DaemonSet/deis-registry-proxy
Containers:
  deis-registry-proxy:
    Container ID:   docker://dc29ab400a06ae5dc1407c7f1fb0880d4257720170eded6a7f8cde5431fa9570
    Image:      quay.io/deis/registry-proxy:v1.0.0
    Image ID:       docker://sha256:fde297ec95aa244e5be48f438de39a13dae16a1593b3792d8c10cd1d7011f8d1
    Port:       80/TCP
    Limits:
      cpu:  100m
      memory:   50Mi
    Requests:
      cpu:      100m
      memory:       50Mi
    State:      Running
      Started:      Mon, 22 Aug 2016 10:38:32 +0000
    Ready:      True
    Restart Count:  0
    Environment Variables:
      REGISTRY_HOST:    $(DEIS_REGISTRY_SERVICE_HOST)
      REGISTRY_PORT:    $(DEIS_REGISTRY_SERVICE_PORT)
Conditions:
  Type      Status
  Initialized   True 
  Ready     True 
  PodScheduled  True 
Volumes:
  default-token-tk993:
    Type:   Secret (a volume populated by a Secret)
    SecretName: default-token-tk993
QoS Tier:   Guaranteed
No events.
bacongobbler commented 8 years ago

From the pod list it looks like the registry-proxy component is missing, which is what proxies requests to the registry. Can you confirm with kubectl --namespace=deis get daemonsets?

DavidSie commented 8 years ago

there are registry proxies. I attached one above but there are 3 the same( I using 1 master + 2 minions).

kubectl --namespace=deis get daemonsets
NAME                    DESIRED   CURRENT   NODE-SELECTOR   AGE
deis-logger-fluentd     3         3         <none>          1d
deis-monitor-telegraf   3         3         <none>          1d
deis-registry-proxy     3         3         <none>          1d
bacongobbler commented 8 years ago

Okay, so if you do indeed have registry proxies then you're probably hitting the same issue as https://github.com/deis/registry/issues/62, since your app relies on the ruby image which is relatively large. I would take a look into that issue and see if you find similar behaviour.

DavidSie commented 8 years ago

According to docker hub https://hub.docker.com/r/library/ruby/tags/ it's only 313 MB, I would say that's average. Are you sure that this address make sense: localhost:5555, since deis-registry 10.3.0.188 <none> 80/TCP and deis-registry-3758253254-3gtjo pod is listening on port 5000 ?

bacongobbler commented 8 years ago

Yes, that address is correct. The request goes through the registry-proxy, which (as the name suggests) proxies the request to the real registry. It's a workaround for the --insecure-registry flag. See https://github.com/deis/registry-proxy#about

bacongobbler commented 8 years ago

Coming back to the original problem, I'd inspect both your registry and minio to ensure that there are no problems with either backend. From reports it seems like slightly larger than normal images built via Dockerfile (>100MB) seem to be causing these issues.

DavidSie commented 8 years ago

this is not a big container size issue (alpine is 2MB: https://hub.docker.com/r/library/alpine/tags/ ) :

 git push deis master
Counting objects: 48, done.
Compressing objects: 100% (47/47), done.
Writing objects: 100% (48/48), 6.35 KiB, done.
Total 48 (delta 14), reused 0 (delta 0)
Starting build... but first, coffee!
...
Step 1 : FROM alpine
---> 4e38e38c8ce0
Step 2 : ENV GOPATH /go
---> Using cache
---> bd4d962b7a6e
Step 3 : ENV GOROOT /usr/local/go
---> Using cache
---> 346b304d9d9d
Step 4 : ENV PATH $PATH:/usr/local/go/bin:/go/bin
---> Using cache
---> bfd14db2b7e7
Step 5 : EXPOSE 80
---> Using cache
---> a019f2dadbcc
Step 6 : ENTRYPOINT while true; do echo hello world; sleep 1; done
---> Using cache
---> d500b7d348cb
Successfully built d500b7d348cb
Pushing to registry
{"errorDetail":{"message":"Put http://localhost:5555/v1/repositories/gaslit-gladness/: dial tcp 127.0.0.1:5555: getsockopt: connection refused"},"error":"Put http://localhost:5555/v1/repositories/gaslit-gladnessremote: tcp 127.0.0.1:5555: getsockopt: connection refused"}

remote: 2016/08/25 07:18:46 Error running git receive hook [Build pod exited with code 1, stopping build.]
To ssh://git@deis-builder.10.63.11.83.nip.io:2222/gaslit-gladness.git
 ! [remote rejected] master -> master (pre-receive hook declined)
error: failed to push some refs to 'ssh://git@deis-builder.10.63.11.83.nip.io:2222/gaslit-gladness.git'

Which container should listen on port 5555 ?

(This is a different cluster but from the same script)

bacongobbler commented 8 years ago

Which container should listen on port 5555 ?

The registry-proxy listens on port 5555.

Can you please provide the following information so we can try to reproduce this?

I recall that there is internal networking issues when using CoreOS with calico: https://github.com/deis/workflow/issues/442

DavidSie commented 8 years ago

from inside the container

root@deis-registry-proxy-jzf3h:/# telnet localhost 5555
Trying ::1...
Trying 127.0.0.1...
telnet: Unable to connect to remote host: Connection refused

root@deis-registry-proxy-jzf3h:/# netstat  -lntpu 
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 0.0.0.0:80              0.0.0.0:*               LISTEN      1/nginx: master pro

kubectl version:

kubectl version
Client Version: version.Info{Major:"1", Minor:"3", GitVersion:"v1.3.4", GitCommit:"dd6b458ef8dbf24aff55795baa68f83383c9b3a9", GitTreeState:"clean", BuildDate:"2016-08-01T16:45:16Z", GoVersion:"go1.6.2", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"3", GitVersion:"v1.3.4+coreos.0", GitCommit:"be9bf3e842a90537e48361aded2872e389e902e7", GitTreeState:"clean", BuildDate:"2016-08-02T00:54:53Z", GoVersion:"go1.6.2", Compiler:"gc", Platform:"linux/amd64"}

to provision kubernetes cluster I used this tutorial: https://coreos.com/kubernetes/docs/latest/getting-started.html

bacongobbler commented 8 years ago

The fun situation about not being able to connect to localhost:5555 within the container is to be expected. We actually mount the host's docker socket, so any command we perform it assumes the host's network. Therefore, localhost:5555 on the host belongs to registry-proxy.

When you provisioned kubernetes, where did you deploy your cluster? AWS, GKE, Vagrant?

DavidSie commented 8 years ago

I did it on Openstack

fdasoghe commented 8 years ago

We had exact same problem, with:

coreos-kubernetes (from github repo #1876aac with kubernetes 1.3.4)
deis 2.4.0
vagrant 1.8.5

To create our Kubernetes cluster we followed the tutorial here: https://coreos.com/kubernetes/docs/latest/kubernetes-on-vagrant.html

After quite a bit of struggling (turning off calico, changing hostPort from 5555 to 80, etc. - nothing changed) we resolved using the plain version of Kubernetes, from the main Deis tutorial here: https://deis.com/docs/workflow/quickstart/provider/vagrant/boot/

with the notable change of Vagrant version, downgrading to 1.8.3, since the 1.8.5 has this bug: https://github.com/mitchellh/vagrant/issues/5186 (it's marked as closed but there's a regression in 1.8.5).

So, for us, the problem was in the CoreOs package. We haven't tried the very last commit though.

EDIT: we also tried the last commit from the CoreOs repository (commit #bdfe006) with Deis 2.4.1, nothing changed.

DavidSie commented 8 years ago

@think01 So you think that kubelet-wrapper provided with CoreOS may be a cause of this problem, right ?

fdasoghe commented 8 years ago

@DavidSie well, I cannot say the problem is in that component, but we solved by avoiding to use the coreos-kubernetes package and going plain with kubernetes on vagrant (that creates some fedora bosex).

Why you talk about kubelet-wrapper?

DavidSie commented 8 years ago

Because I saw that CoreOS is shipped with this script /usr/lib/coreos/kubelet-wrapper but there I see it only starts hyperkube on rkt.

bacongobbler commented 7 years ago

ping @DavidSie, were you able to identify the root cause of your issue here?

rbellamy commented 7 years ago

I am experiencing what I think is a similar issue. My image is 385.9M (so it's >100M as mentioned by @bacongobbler). Regarding "inspecting" the backend - I cannot figure out how to get helpful logging out of the minio pod. I've tried the --debug switch in various permutations, then found https://github.com/minio/minio/pull/820 which seems to indicate that it's no longer valid because it's not needed. I've tried setting MINIO_TRACE=1 per some code fragments I found. However, the kubectl --namespace logs deis-minio-123xyz only ever shows what I assume is the minio startup stuff - there's no debug log, no trace log, nothing to indicate the behavior of minio during operation.

The first time: deis pull

2016-09-21 08:28:43
rbellamy@eanna i ~/Development/Terradatum/aergo/aergo-server feature/docker % deis pull 192.168.57.10:5000/aergo-server:1.0.0-SNAPSHOT -a aergo-server
Creating build... Error: Unknown Error (400): {"detail":"dial tcp 10.11.28.91:9000: i/o timeout"}
zsh: exit 1     deis pull 192.168.57.10:5000/aergo-server:1.0.0-SNAPSHOT -a aergo-server

controller logs

INFO [aergo-server]: build aergo-server-11b3c2a created
INFO [aergo-server]: rbellamy deployed 192.168.57.10:5000/aergo-server:1.0.0-SNAPSHOT
INFO Pulling Docker image 192.168.57.10:5000/aergo-server:1.0.0-SNAPSHOT
INFO Tagging Docker image 192.168.57.10:5000/aergo-server:1.0.0-SNAPSHOT as localhost:5555/aergo-server:v2
INFO Pushing Docker image localhost:5555/aergo-server:v2
INFO Pushing Docker image localhost:5555/aergo-server:v2
INFO Pushing Docker image localhost:5555/aergo-server:v2
INFO [aergo-server]: dial tcp 10.11.28.91:9000: i/o timeout
ERROR:root:dial tcp 10.11.28.91:9000: i/o timeout
Traceback (most recent call last):
  File "/app/api/models/release.py", line 88, in new
    release.publish()
  File "/app/api/models/release.py", line 135, in publish
    publish_release(source_image, self.image, deis_registry, self.get_registry_auth())
  File "/app/registry/dockerclient.py", line 199, in publish_release
    return DockerClient().publish_release(source, target, deis_registry, creds)
  File "/app/registry/dockerclient.py", line 117, in publish_release
    self.push("{}/{}".format(self.registry, name), tag)
  File "/usr/local/lib/python3.5/dist-packages/backoff.py", line 286, in retry
    ret = target(*args, **kwargs)
  File "/app/registry/dockerclient.py", line 135, in push
    log_output(stream, 'push', repo, tag)
  File "/app/registry/dockerclient.py", line 178, in log_output
    stream_error(chunk, operation, repo, tag)
  File "/app/registry/dockerclient.py", line 195, in stream_error
    raise RegistryException(message)
registry.dockerclient.RegistryException: dial tcp 10.11.28.91:9000: i/o timeout

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/app/api/models/build.py", line 62, in create
    source_version=self.version
  File "/app/api/models/release.py", line 95, in new
    raise DeisException(str(e)) from e
api.exceptions.DeisException: dial tcp 10.11.28.91:9000: i/o timeout

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/rest_framework/views.py", line 471, in dispatch
    response = handler(request, *args, **kwargs)
  File "/app/api/views.py", line 181, in create
    return super(AppResourceViewSet, self).create(request, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/rest_framework/mixins.py", line 21, in create
    self.perform_create(serializer)
  File "/app/api/viewsets.py", line 21, in perform_create
    self.post_save(obj)
  File "/app/api/views.py", line 258, in post_save
    self.release = build.create(self.request.user)
  File "/app/api/models/build.py", line 71, in create
    raise DeisException(str(e)) from e
api.exceptions.DeisException: dial tcp 10.11.28.91:9000: i/o timeout
10.10.2.8 "POST /v2/apps/aergo-server/builds/ HTTP/1.1" 400 51 "Deis Client v2.5.1"

Then immediately, I try again: deis pull

2016-09-21 08:42:27
rbellamy@eanna i ~/Development/Terradatum/aergo/aergo-server feature/docker % deis pull 192.168.57.10:5000/aergo-server:1.0.0-SNAPSHOT -a aergo-server
Creating build... Error: Unknown Error (502): <html>
<head><title>502 Bad Gateway</title></head>
<body bgcolor="white">
<center><h1>502 Bad Gateway</h1></center>
<hr><center>nginx/1.11.2</center>
</body>
</html>

zsh: exit 1     deis pull 192.168.57.10:5000/aergo-server:1.0.0-SNAPSHOT -a aergo-server

controller logs

INFO [aergo-server]: build aergo-server-c09bb9b created
INFO [aergo-server]: rbellamy deployed 192.168.57.10:5000/aergo-server:1.0.0-SNAPSHOT
INFO Pulling Docker image 192.168.57.10:5000/aergo-server:1.0.0-SNAPSHOT
INFO Tagging Docker image 192.168.57.10:5000/aergo-server:1.0.0-SNAPSHOT as localhost:5555/aergo-server:v4
INFO Pushing Docker image localhost:5555/aergo-server:v4
INFO Pushing Docker image localhost:5555/aergo-server:v4
10.10.2.8 "GET /v2/apps/aergo-server/logs HTTP/1.1" 200 1284 "Deis Client v2.5.1"
INFO Pushing Docker image localhost:5555/aergo-server:v4
[2016-09-21 16:05:50 +0000] [24] [CRITICAL] WORKER TIMEOUT (pid:37)
[2016-09-21 16:05:50 +0000] [37] [WARNING] worker aborted
  File "/usr/local/bin/gunicorn", line 11, in <module>
    sys.exit(run())
  File "/usr/local/lib/python3.5/dist-packages/gunicorn/app/wsgiapp.py", line 74, in run
    WSGIApplication("%(prog)s [OPTIONS] [APP_MODULE]").run()
  File "/usr/local/lib/python3.5/dist-packages/gunicorn/app/base.py", line 192, in run
    super(Application, self).run()
  File "/usr/local/lib/python3.5/dist-packages/gunicorn/app/base.py", line 72, in run
    Arbiter(self).run()
  File "/usr/local/lib/python3.5/dist-packages/gunicorn/arbiter.py", line 189, in run
    self.manage_workers()
  File "/usr/local/lib/python3.5/dist-packages/gunicorn/arbiter.py", line 524, in manage_workers
    self.spawn_workers()
  File "/usr/local/lib/python3.5/dist-packages/gunicorn/arbiter.py", line 590, in spawn_workers
    self.spawn_worker()
  File "/usr/local/lib/python3.5/dist-packages/gunicorn/arbiter.py", line 557, in spawn_worker
    worker.init_process()
  File "/usr/local/lib/python3.5/dist-packages/gunicorn/workers/base.py", line 132, in init_process
    self.run()
  File "/usr/local/lib/python3.5/dist-packages/gunicorn/workers/sync.py", line 124, in run
    self.run_for_one(timeout)
  File "/usr/local/lib/python3.5/dist-packages/gunicorn/workers/sync.py", line 68, in run_for_one
    self.accept(listener)
  File "/usr/local/lib/python3.5/dist-packages/gunicorn/workers/sync.py", line 30, in accept
    self.handle(listener, client, addr)
  File "/usr/local/lib/python3.5/dist-packages/gunicorn/workers/sync.py", line 135, in handle
    self.handle_request(listener, req, client, addr)
  File "/usr/local/lib/python3.5/dist-packages/gunicorn/workers/sync.py", line 176, in handle_request
    respiter = self.wsgi(environ, resp.start_response)
  File "/usr/local/lib/python3.5/dist-packages/django/core/handlers/wsgi.py", line 170, in __call__
    response = self.get_response(request)
  File "/usr/local/lib/python3.5/dist-packages/django/core/handlers/base.py", line 124, in get_response
    response = self._middleware_chain(request)
  File "/usr/local/lib/python3.5/dist-packages/django/core/handlers/exception.py", line 39, in inner
    response = get_response(request)
  File "/usr/local/lib/python3.5/dist-packages/django/utils/deprecation.py", line 133, in __call__
    response = self.get_response(request)
  File "/usr/local/lib/python3.5/dist-packages/django/core/handlers/exception.py", line 39, in inner
    response = get_response(request)
  File "/usr/local/lib/python3.5/dist-packages/django/utils/deprecation.py", line 133, in __call__
    response = self.get_response(request)
  File "/usr/local/lib/python3.5/dist-packages/django/core/handlers/exception.py", line 39, in inner
    response = get_response(request)
  File "/usr/local/lib/python3.5/dist-packages/django/utils/deprecation.py", line 133, in __call__
    response = self.get_response(request)
  File "/usr/local/lib/python3.5/dist-packages/django/core/handlers/exception.py", line 39, in inner
    response = get_response(request)
  File "/usr/local/lib/python3.5/dist-packages/django/utils/deprecation.py", line 133, in __call__
    response = self.get_response(request)
  File "/usr/local/lib/python3.5/dist-packages/django/core/handlers/exception.py", line 39, in inner
    response = get_response(request)
  File "/usr/local/lib/python3.5/dist-packages/django/utils/deprecation.py", line 133, in __call__
    response = self.get_response(request)
  File "/usr/local/lib/python3.5/dist-packages/django/core/handlers/exception.py", line 39, in inner
    response = get_response(request)
  File "/usr/local/lib/python3.5/dist-packages/django/utils/deprecation.py", line 133, in __call__
    response = self.get_response(request)
  File "/usr/local/lib/python3.5/dist-packages/django/core/handlers/exception.py", line 39, in inner
    response = get_response(request)
  File "/usr/local/lib/python3.5/dist-packages/django/utils/deprecation.py", line 133, in __call__
    response = self.get_response(request)
  File "/usr/local/lib/python3.5/dist-packages/django/core/handlers/exception.py", line 39, in inner
    response = get_response(request)
  File "/usr/local/lib/python3.5/dist-packages/django/utils/deprecation.py", line 133, in __call__
    response = self.get_response(request)
  File "/usr/local/lib/python3.5/dist-packages/django/core/handlers/exception.py", line 39, in inner
    response = get_response(request)
  File "/app/api/middleware.py", line 22, in __call__
    response = self.get_response(request)
  File "/usr/local/lib/python3.5/dist-packages/django/core/handlers/exception.py", line 39, in inner
    response = get_response(request)
  File "/usr/local/lib/python3.5/dist-packages/django/core/handlers/base.py", line 185, in _get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "/usr/local/lib/python3.5/dist-packages/django/views/decorators/csrf.py", line 58, in wrapped_view
    return view_func(*args, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/rest_framework/viewsets.py", line 87, in view
    return self.dispatch(request, *args, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/rest_framework/views.py", line 471, in dispatch
    response = handler(request, *args, **kwargs)
  File "/app/api/views.py", line 181, in create
    return super(AppResourceViewSet, self).create(request, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/rest_framework/mixins.py", line 21, in create
    self.perform_create(serializer)
  File "/app/api/viewsets.py", line 21, in perform_create
    self.post_save(obj)
  File "/app/api/views.py", line 258, in post_save
    self.release = build.create(self.request.user)
  File "/app/api/models/build.py", line 62, in create
    source_version=self.version
  File "/app/api/models/release.py", line 88, in new
    release.publish()
  File "/app/api/models/release.py", line 135, in publish
    publish_release(source_image, self.image, deis_registry, self.get_registry_auth())
  File "/app/registry/dockerclient.py", line 199, in publish_release
    return DockerClient().publish_release(source, target, deis_registry, creds)
  File "/app/registry/dockerclient.py", line 117, in publish_release
    self.push("{}/{}".format(self.registry, name), tag)
  File "/usr/local/lib/python3.5/dist-packages/backoff.py", line 286, in retry
    ret = target(*args, **kwargs)
  File "/app/registry/dockerclient.py", line 135, in push
    log_output(stream, 'push', repo, tag)
  File "/app/registry/dockerclient.py", line 175, in log_output
    for chunk in stream:
  File "/usr/local/lib/python3.5/dist-packages/docker/client.py", line 245, in _stream_helper
    data = reader.read(1)
  File "/usr/local/lib/python3.5/dist-packages/requests/packages/urllib3/response.py", line 314, in read
    data = self._fp.read(amt)
  File "/usr/lib/python3.5/http/client.py", line 448, in read
    n = self.readinto(b)
  File "/usr/lib/python3.5/http/client.py", line 478, in readinto
    return self._readinto_chunked(b)
  File "/usr/lib/python3.5/http/client.py", line 573, in _readinto_chunked
    chunk_left = self._get_chunk_left()
  File "/usr/lib/python3.5/http/client.py", line 541, in _get_chunk_left
    chunk_left = self._read_next_chunk_size()
  File "/usr/lib/python3.5/http/client.py", line 501, in _read_next_chunk_size
    line = self.fp.readline(_MAXLINE + 1)
  File "/usr/lib/python3.5/socket.py", line 575, in readinto
    return self._sock.recv_into(b)
  File "/usr/local/lib/python3.5/dist-packages/gunicorn/workers/base.py", line 191, in handle_abort
    self.cfg.worker_abort(self)
  File "/app/deis/gunicorn/config.py", line 36, in worker_abort
    traceback.print_stack()
bacongobbler commented 7 years ago

@rbellamy can you post registry logs in a gist? That will likely give us more information why the registry is failing to communicate with minio.

rbellamy commented 7 years ago

@bacongobbler will do.

Also, may be related to https://github.com/minio/minio/issues/2743.

rbellamy commented 7 years ago

Registry logs: https://gist.github.com/rbellamy/c0db447ed47c364ae396b5d0c9852a02

rbellamy commented 7 years ago

Here's my setup, using Alpha channel of CoreOS and libvirt:

export KUBERNETES_PROVIDER=libvirt-coreos && export NUM_NODES=4
./cluster/kube-up.sh
# wait for etcd to settle
helmc install workflow-v2.5.0
# wait for kubernetes cluster to all be ready
deis pull 192.168.57.10:5000/aergo-server:1.0.0-SNAPSHOT -a aergo-server
rbellamy commented 7 years ago

Worked with @harshavardhana from the minio crew to try to troubleshoot this.

For whatever reason, during our teleconsole session, I was able to successfully push the image into the deis-registry-proxy - but then saw the same dial i/o timeout but in a different context. This time, it was while pulling the image from the proxy, during the image app:deploy phase.

NOTE: you can ignore the 404 below - v4 of the aergo-server doesn't exist since I've restarted the minio pod several times during troubleshooting. The v5 release is definitely stored in minio, as can be seen in the mc ls command at the bottom of this post.

INFO [aergo-server]: build aergo-server-49c7405 created
INFO [aergo-server]: rbellamy deployed 192.168.57.10:5000/aergo-server:1.0.0-SNAPSHOT
INFO Pulling Docker image 192.168.57.10:5000/aergo-server:1.0.0-SNAPSHOT
INFO Tagging Docker image 192.168.57.10:5000/aergo-server:1.0.0-SNAPSHOT as localhost:5555/aergo-server:v5
INFO Pushing Docker image localhost:5555/aergo-server:v5
INFO Pulling Docker image localhost:5555/aergo-server:v5
INFO [aergo-server]: adding 5s on to the original 120s timeout to account for the initial delay specified in the liveness / readiness probe
INFO [aergo-server]: This deployments overall timeout is 125s - batch timout is 125s and there are 1 batches to deploy with a total of 1 pods
INFO [aergo-server]: waited 10s and 1 pods are in service
INFO [aergo-server]: waited 20s and 1 pods are in service
INFO [aergo-server]: waited 30s and 1 pods are in service
INFO [aergo-server]: waited 40s and 1 pods are in service
ERROR [aergo-server]: There was a problem deploying v5. Rolling back process types to release v4.
INFO Pulling Docker image localhost:5555/aergo-server:v4
INFO Pulling Docker image localhost:5555/aergo-server:v4
INFO Pulling Docker image localhost:5555/aergo-server:v4
INFO Pulling Docker image localhost:5555/aergo-server:v4
INFO Pulling Docker image localhost:5555/aergo-server:v4
INFO Pulling Docker image localhost:5555/aergo-server:v4
INFO Pulling Docker image localhost:5555/aergo-server:v4
INFO Pulling Docker image localhost:5555/aergo-server:v4
INFO Pulling Docker image localhost:5555/aergo-server:v4
ERROR [aergo-server]: (app::deploy): image aergo-server:v4 not found
ERROR:root:(app::deploy): image aergo-server:v4 not found
Traceback (most recent call last):
  File "/app/scheduler/__init__.py", line 168, in deploy
    deployment = self.deployment.get(namespace, name).json()
  File "/app/scheduler/resources/deployment.py", line 29, in get
    raise KubeHTTPException(response, message, *args)
scheduler.exceptions.KubeHTTPException: ('failed to get Deployment "aergo-server-cmd" in Namespace "aergo-server": 404 Not Found', 'aergo-server-cmd', 'aergo-server')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/app/api/models/app.py", line 578, in deploy
    async_run(tasks)
  File "/app/api/utils.py", line 169, in async_run
    raise error
  File "/usr/lib/python3.5/asyncio/tasks.py", line 241, in _step
    result = coro.throw(exc)
  File "/app/api/utils.py", line 182, in async_task
    yield from loop.run_in_executor(None, params)
  File "/usr/lib/python3.5/asyncio/futures.py", line 361, in __iter__
    yield self  # This tells Task to wait for completion.
  File "/usr/lib/python3.5/asyncio/tasks.py", line 296, in _wakeup
    future.result()
  File "/usr/lib/python3.5/asyncio/futures.py", line 274, in result
    raise self._exception
  File "/usr/lib/python3.5/concurrent/futures/thread.py", line 55, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/app/scheduler/__init__.py", line 175, in deploy
    namespace, name, image, entrypoint, command, **kwargs
  File "/app/scheduler/resources/deployment.py", line 123, in create
    self.wait_until_ready(namespace, name, **kwargs)
  File "/app/scheduler/resources/deployment.py", line 338, in wait_until_ready
    additional_timeout = self.pod._handle_pending_pods(namespace, labels)
  File "/app/scheduler/resources/pod.py", line 552, in _handle_pending_pods
    self._handle_pod_errors(pod, reason, message)
  File "/app/scheduler/resources/pod.py", line 491, in _handle_pod_errors
    raise KubeException(message)
scheduler.exceptions.KubeException: error pulling image configuration: Get http://10.11.28.91:9000/registry/docker/registry/v2/blobs/sha256/59/5905a7c362fbff9626d517a6ba0d8930fba34a321ba4c7bb718144d80cfaf29b/data?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=8TZRY2JRWMPT6UMXR6I5%2F20160921%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20160921T194800Z&X-Amz-Expires=1200&X-Amz-SignedHeaders=host&X-Amz-Signature=314c92bb84dbd4dd41f9bc572e625201a32ce300394d34e8516a57382fd2ec52: dial tcp 10.11.28.91:9000: i/o timeout

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/app/api/models/release.py", line 168, in get_port
    port = docker_get_port(self.image, deis_registry, creds)
  File "/app/registry/dockerclient.py", line 203, in get_port
    return DockerClient().get_port(target, deis_registry, creds)
  File "/app/registry/dockerclient.py", line 79, in get_port
    info = self.inspect_image(target)
  File "/usr/local/lib/python3.5/dist-packages/backoff.py", line 286, in retry
    ret = target(*args, **kwargs)
  File "/app/registry/dockerclient.py", line 156, in inspect_image
    self.pull(repo, tag=tag)
  File "/usr/local/lib/python3.5/dist-packages/backoff.py", line 286, in retry
    ret = target(*args, **kwargs)
  File "/app/registry/dockerclient.py", line 128, in pull
    log_output(stream, 'pull', repo, tag)
  File "/app/registry/dockerclient.py", line 178, in log_output
    stream_error(chunk, operation, repo, tag)
  File "/app/registry/dockerclient.py", line 195, in stream_error
    raise RegistryException(message)
registry.dockerclient.RegistryException: image aergo-server:v4 not found

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/app/api/models/app.py", line 585, in deploy
    self.deploy(release.previous(), force_deploy=True, rollback_on_failure=False)
  File "/app/api/models/app.py", line 526, in deploy
    port = release.get_port()
  File "/app/api/models/release.py", line 176, in get_port
    raise DeisException(str(e)) from e
api.exceptions.DeisException: image aergo-server:v4 not found

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/app/api/models/build.py", line 64, in create
    self.app.deploy(new_release)
  File "/app/api/models/app.py", line 595, in deploy
    raise ServiceUnavailable(err) from e
api.exceptions.ServiceUnavailable: (app::deploy): image aergo-server:v4 not found

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/rest_framework/views.py", line 471, in dispatch
    response = handler(request, *args, **kwargs)
  File "/app/api/views.py", line 181, in create
    return super(AppResourceViewSet, self).create(request, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/rest_framework/mixins.py", line 21, in create
    self.perform_create(serializer)
  File "/app/api/viewsets.py", line 21, in perform_create
    self.post_save(obj)
  File "/app/api/views.py", line 258, in post_save
    self.release = build.create(self.request.user)
  File "/app/api/models/build.py", line 71, in create
    raise DeisException(str(e)) from e
api.exceptions.DeisException: (app::deploy): image aergo-server:v4 not found
10.10.2.8 "POST /v2/apps/aergo-server/builds/ HTTP/1.1" 400 59 "Deis Client v2.5.1"

And as you can see, the minio store definitely contains the image, and the proxy can communicate with the minio backend:

root@deis-registry-proxy-ccf4u:~# mc ls myminio/registry -r
[2016-09-21 19:47:36 UTC] 1.5KiB docker/registry/v2/blobs/sha256/2f/2fc6d0a3ec447743456f6fe782622ede8095b662bb39cb10c50b2a795829e51f/data
[2016-09-21 19:46:45 UTC]   112B docker/registry/v2/blobs/sha256/53/5345ff73e9fcf7b6c7d2d7eca2b0338ab274560ff988b8f63e60f73dfe0297ec/data
[2016-09-21 19:47:36 UTC] 5.0KiB docker/registry/v2/blobs/sha256/59/5905a7c362fbff9626d517a6ba0d8930fba34a321ba4c7bb718144d80cfaf29b/data
[2016-09-21 19:46:45 UTC]   232B docker/registry/v2/blobs/sha256/a6/a696cba1f6e865421664a7bf9bf585bcfaa924d56b7d2a112a799e00a7433791/data
[2016-09-21 19:47:14 UTC]  94MiB docker/registry/v2/blobs/sha256/b4/b419440b08d223eabe64f26d5f8556ee8d3f4c0bcafb8dd64ec525cc4eea7f6e/data
[2016-09-21 19:47:19 UTC]  94MiB docker/registry/v2/blobs/sha256/c0/c0963e676944ab20c36e857c33d76a6ba2166aaa6a0d3961d6cf20fae965efd0/data
[2016-09-21 19:47:14 UTC]  47MiB docker/registry/v2/blobs/sha256/d0/d0f0d61cd0d229546b1e33b0c92036ad3f35b42dd2c9a945aeaf67f84684ce26/data
[2016-09-21 19:46:59 UTC] 2.2MiB docker/registry/v2/blobs/sha256/e1/e110a4a1794126ef308a49f2d65785af2f25538f06700721aad8283b81fdfa58/data
[2016-09-21 19:46:45 UTC]    71B docker/registry/v2/repositories/aergo-server/_layers/sha256/5345ff73e9fcf7b6c7d2d7eca2b0338ab274560ff988b8f63e60f73dfe0297ec/link
[2016-09-21 19:47:36 UTC]    71B docker/registry/v2/repositories/aergo-server/_layers/sha256/5905a7c362fbff9626d517a6ba0d8930fba34a321ba4c7bb718144d80cfaf29b/link
[2016-09-21 19:46:45 UTC]    71B docker/registry/v2/repositories/aergo-server/_layers/sha256/a696cba1f6e865421664a7bf9bf585bcfaa924d56b7d2a112a799e00a7433791/link
[2016-09-21 19:47:18 UTC]    71B docker/registry/v2/repositories/aergo-server/_layers/sha256/b419440b08d223eabe64f26d5f8556ee8d3f4c0bcafb8dd64ec525cc4eea7f6e/link
[2016-09-21 19:47:19 UTC]    71B docker/registry/v2/repositories/aergo-server/_layers/sha256/c0963e676944ab20c36e857c33d76a6ba2166aaa6a0d3961d6cf20fae965efd0/link
[2016-09-21 19:47:18 UTC]    71B docker/registry/v2/repositories/aergo-server/_layers/sha256/d0f0d61cd0d229546b1e33b0c92036ad3f35b42dd2c9a945aeaf67f84684ce26/link
[2016-09-21 19:46:59 UTC]    71B docker/registry/v2/repositories/aergo-server/_layers/sha256/e110a4a1794126ef308a49f2d65785af2f25538f06700721aad8283b81fdfa58/link
[2016-09-21 19:47:36 UTC]    71B docker/registry/v2/repositories/aergo-server/_manifests/revisions/sha256/2fc6d0a3ec447743456f6fe782622ede8095b662bb39cb10c50b2a795829e51f/link
[2016-09-21 19:47:36 UTC]    71B docker/registry/v2/repositories/aergo-server/_manifests/tags/v5/current/link
[2016-09-21 19:47:36 UTC]    71B docker/registry/v2/repositories/aergo-server/_manifests/tags/v5/index/sha256/2fc6d0a3ec447743456f6fe782622ede8095b662bb39cb10c50b2a795829e51f/link
harshavardhana commented 7 years ago

@bacongobbler - if you have a setup locally we can work on this and see what is causing the problem. Do not have kubernetes setup locally. i/o timeout seems to be related to network problem between registry and minio server. Need to see if the server itself is not responding properly. Couldn't see it with mc though.

rbellamy commented 7 years ago

So, from my registry log gist: https://gist.github.com/rbellamy/c0db447ed47c364ae396b5d0c9852a02#file-deis-issue-64-registry-proxy-logs-L1242

bacongobbler commented 7 years ago

@harshavardhana unfortunately we do not have any clusters reproducing this issue locally nor can we reproduce it ourselves, other than for the calico networking issue.

@rbellamy if you can supply information about how you set up your cluster including your KUBERNETES _PROVIDER envvar when using kube-up.sh and what version of workflow we can try to reproduce there. As far as e2e is concerned we aren't seeing this issue in master or in recent releases. http://ci.deis.io

rbellamy commented 7 years ago

@bacongobbler I included that information in a comment in this issue: https://github.com/deis/registry/issues/64#issuecomment-248700404

bacongobbler commented 7 years ago

Thank you! From what others have voiced earlier it sounds like this sounds related to a CoreOS issue as seen earlier in https://github.com/deis/registry/issues/64#issuecomment-243107833. I'd recommend trying a different provider first and see if that resolves your issue.

rbellamy commented 7 years ago

I'm not sure how diagnostic this is, given I'm testing within a single libvirt host - however it should be noted that the host is running 2 x 12 AMD Opteron CPUs on a Supermicro MB with 128G RAM and all SSDs, and each VM is provisioned with 4G and 2CPUs, so I find it hard to believe that the issue at hand is related to overloaded VM host or guest.

From what @bacongobbler has said, deis hasn't seen this in their e2e test runner on k8s. I'd be interested to know what the test matrix looks like WRT other providers/hosts.

Maybe this is a CoreOS-related problem? Given https://github.com/coreos/bugs/issues/1554 it doesn't seem outside the realm of possibility.

Kubernetes on CoreOS (using libvirt-coreos provider and ./kube-up.sh script)

master with 3 nodes

INFO [aergo-server]: build aergo-server-6972f5f created
INFO [aergo-server]: rbellamy deployed 192.168.57.10:5000/aergo-server:1.0.0-SNAPSHOT
INFO Pulling Docker image 192.168.57.10:5000/aergo-server:1.0.0-SNAPSHOT
INFO Tagging Docker image 192.168.57.10:5000/aergo-server:1.0.0-SNAPSHOT as localhost:5555/aergo-server:v2
INFO Pushing Docker image localhost:5555/aergo-server:v2
INFO Pushing Docker image localhost:5555/aergo-server:v2
INFO Pushing Docker image localhost:5555/aergo-server:v2
INFO [aergo-server]: Put http://localhost:5555/v1/repositories/aergo-server/: read tcp 127.0.0.1:49384->127.0.0.1:5555: read: connection reset by peer
ERROR:root:Put http://localhost:5555/v1/repositories/aergo-server/: read tcp 127.0.0.1:49384->127.0.0.1:5555: read: connection reset by peer
Traceback (most recent call last):
  File "/app/api/models/release.py", line 88, in new
    release.publish()
  File "/app/api/models/release.py", line 135, in publish
    publish_release(source_image, self.image, deis_registry, self.get_registry_auth())
  File "/app/registry/dockerclient.py", line 199, in publish_release
    return DockerClient().publish_release(source, target, deis_registry, creds)
  File "/app/registry/dockerclient.py", line 117, in publish_release
    self.push("{}/{}".format(self.registry, name), tag)
  File "/usr/local/lib/python3.5/dist-packages/backoff.py", line 286, in retry
    ret = target(*args, **kwargs)
  File "/app/registry/dockerclient.py", line 135, in push
    log_output(stream, 'push', repo, tag)
  File "/app/registry/dockerclient.py", line 178, in log_output
    stream_error(chunk, operation, repo, tag)
  File "/app/registry/dockerclient.py", line 195, in stream_error
    raise RegistryException(message)
registry.dockerclient.RegistryException: Put http://localhost:5555/v1/repositories/aergo-server/: read tcp 127.0.0.1:49384->127.0.0.1:5555: read: connection reset by peer

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/app/api/models/build.py", line 62, in create
    source_version=self.version
  File "/app/api/models/release.py", line 95, in new
    raise DeisException(str(e)) from e
api.exceptions.DeisException: Put http://localhost:5555/v1/repositories/aergo-server/: read tcp 127.0.0.1:49384->127.0.0.1:5555: read: connection reset by peer

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/rest_framework/views.py", line 471, in dispatch
    response = handler(request, *args, **kwargs)
  File "/app/api/views.py", line 181, in create
    return super(AppResourceViewSet, self).create(request, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/rest_framework/mixins.py", line 21, in create
    self.perform_create(serializer)
  File "/app/api/viewsets.py", line 21, in perform_create
    self.post_save(obj)
  File "/app/api/views.py", line 258, in post_save
    self.release = build.create(self.request.user)
  File "/app/api/models/build.py", line 71, in create
    raise DeisException(str(e)) from e
api.exceptions.DeisException: Put http://localhost:5555/v1/repositories/aergo-server/: read tcp 127.0.0.1:49384->127.0.0.1:5555: read: connection reset by peer
10.10.1.5 "POST /v2/apps/aergo-server/builds/ HTTP/1.1" 400 142 "Deis Client v2.5.1"
bacongobbler commented 7 years ago

Maybe this is a CoreOS-related problem? Given coreos/bugs#1554 it doesn't seem outside the realm of possibility.

Yes, I do believe this is a CoreOS related problem as I mentioned in my previous comment. If you can try provisioning a cluster with a different provider that can help narrow down the issue.

rbellamy commented 7 years ago

@bacongobbler I've used corectl and Kube-Solo with success.

bacongobbler commented 7 years ago

@DavidSie after reading the logs just a little more closely, I realized that this seems to be that it looks like your docker daemon is trying to push to a v1 registry endpoint.

Put http://localhost:5555/v1/repositories/spree/: dial tcp 127.0.0.1:5555: getsockopt: connection refused"

Notice the v1 in there. Since this is directly related to dockerbuilder because buildpack deploys work fine for you, I wonder if it's due to the docker python library auto-detecting the client version: https://github.com/deis/dockerbuilder/blob/28c31d45a17a97473e83c451b0d2e743678620c0/rootfs/deploy.py#L106

@rbellamy can you please re-open a separate issue? Your issue doesn't look to be the same as it looks like the original error from your report is about minio:

error pulling image configuration: Get http://10.11.28.91:9000/registry/docker/registry/v2/blobs/sha256/59/5905a7c362fbff9626d517a6ba0d8930fba34a321ba4c7bb718144d80cfaf29b/data?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=8TZRY2JRWMPT6UMXR6I5%2F20160921%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20160921T194800Z&X-Amz-Expires=1200&X-Amz-SignedHeaders=host&X-Amz-Signature=314c92bb84dbd4dd41f9bc572e625201a32ce300394d34e8516a57382fd2ec52: dial tcp 10.11.28.91:9000: i/o timeout
harshavardhana commented 7 years ago

error pulling image configuration: Get http://10.11.28.91:9000/registry/docker/registry/v2/blobs/sha256/59/5905a7c362fbff9626d517a6ba0d8930fba34a321ba4c7bb718144d80cfaf29b/data?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=8TZRY2JRWMPT6UMXR6I5%2F20160921%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20160921T194800Z&X-Amz-Expires=1200&X-Amz-SignedHeaders=host&X-Amz-Signature=314c92bb84dbd4dd41f9bc572e625201a32ce300394d34e8516a57382fd2ec52: dial tcp 10.11.28.91:9000: i/o timeout

Is this still the network issue we were talking about previously? @bacongobbler - let me know how can i help here.

bacongobbler commented 7 years ago

Yes. @rbellamy believe he has nailed it down as a symptom of coreos/bugs#1554. Thank you for the offer, though!

DavidSie commented 7 years ago

@bacongobbler Do you know how can I fix this issue ? Simply update deis (now I use 2.3.0)

bacongobbler commented 7 years ago

I'm not sure how this could be fixed, however using 2.5.0 would never hurt.

dblackdblack commented 7 years ago

I ran into this exact problem when setting up using the CoreOS tool as well. It's too bad that CoreOS aws-cli has this problem b/c the CoreOS tool works really well with cloudformation, which makes teardown a snap after trying out deis. kube-up does not use cloudformation and leaves crap all over your AWS account after you're done with it.

bacongobbler commented 7 years ago

@dblackdblack even after using ./cluster/kube-down.sh? I've always found that script tears down all the AWS resources it created.

bacongobbler commented 7 years ago

So after debugging with both @jdumars and @felixbuenemann, both clusters seem to be showing the same symptom. The problem? Requesting a hostPort on some providers - like Rancher and CoreOS - does not work. @kmala pointed me towards https://github.com/kubernetes/kubernetes/issues/23920 so it looks like we found our smoking gun.

bacongobbler commented 7 years ago

And for anyone who wants to take a crack at trying a patch, they can run through the following instructions to patch workflow-v2.7.0, removing registry-proxy and making the controller and builder connect directly with the registry. This will require the old --insecure-registry flag to be enabled so the docker daemon can talk to the registry, but here's the commands and the patch on a fresh cluster that shows this symptom:

git clone https://github.com/deis/charts
cd charts
curl https://gist.githubusercontent.com/bacongobbler/0b5f2c4fe6f067ddb775d53d635cc74d/raw/992a95edb8430ebcddba526fb1c48d9d0fcc1166/remove-registry-proxy.patch | git apply -
kubectl delete namespace deis
# also delete any app namespaces so you have a fresh cluster
rm -rf ~/.helmc/workspace/charts/workflow-v2.7.0
cp -R workflow-v2.7.0 ~/.helmc/workspace/charts/
helmc generate workflow-v2.7.0
helmc install workflow-v2.7.0

Note that this will purge your cluster entirely of Workflow.

There is currently no workaround for this as far as I'm aware, but if users want to bring this issue to light they can try to contribute patches upstream to kubernetes! :)

zinuzoid commented 7 years ago

In case anyone wants to patch workflow-dev you can use this gist with @bacongobbler instruction above.

https://gist.githubusercontent.com/zinuzoid/621dc94e848f8f390c787c307e848ed2/raw/7c7d98a7cd0986ed6a99a67b675932e29ca0ed7d/remove-registry-proxy.patch

bacongobbler commented 7 years ago

@zinuzoid the instructions above use that exact patch :)

EDIT: I missed the one line change you made in your patch and the fact it's for workflow-dev. Nice catch!

zinuzoid commented 7 years ago

@bacongobbler plus one line in workflow-dev/tpl/storage.sh for me to make it work :)

bacongobbler commented 7 years ago

I'm going to close this issue as there is nothing we can do here to work around this issue in Workflow other than with the patch I provided. This is an upstream issue and patches should be applied upstream. Until then please feel free to run with the patch provided here for production deployments that rely on CNI networking. Thanks!

jwalters-gpsw commented 7 years ago

When applying the patch got this corrupt patch at line 6 message: mbr-31107:charts jwalters$ curl https://gist.githubusercontent.com/bacongobbler/0b5f2c4fe6f067ddb775d53d635cc74d/raw/32a86cc4ddfa0a7cb173b1184ac3e288dedb5a84/remove-registry-proxy.patch | git apply - % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 3080 100 3080 0 0 3557 0 --:--:-- --:--:-- --:--:-- 3556 fatal: corrupt patch at line 6

bacongobbler commented 7 years ago

@jwalters-gpsw try again. I just fixed the patch.

curl https://gist.githubusercontent.com/bacongobbler/0b5f2c4fe6f067ddb775d53d635cc74d/raw/992a95edb8430ebcddba526fb1c48d9d0fcc1166/remove-registry-proxy.patch | git apply -
bacongobbler commented 7 years ago

v2.8.0 patch:

curl https://gist.githubusercontent.com/bacongobbler/0b5f2c4fe6f067ddb775d53d635cc74d/raw/248a052dd0575419d5890abaedec3a7940f3ada6/remove-registry-proxy-v2.8.0.patch | git apply -
bacongobbler commented 7 years ago

Thanks for the updated patch. I'm running coreos on AWS. Is there a way for me to restart the docker daemons with the insecure registry option? Or would I need to redeploy the cluster?

It's easier to re-deploy the cluster if you're just getting set up. Otherwise you'll have to manually SSH into each node, modify the daemon startup flags and reboot docker on every node.

jwalters-gpsw commented 7 years ago

Thanks. I will give that a try. Also thinking about doing a Deis upgrade to the same version per the upgrade instructions but setting the registry to an off-cluster registry.

jwalters-gpsw commented 7 years ago

Manually updated the worker nodes docker config and applied your changes and its working fine now.

ineu commented 7 years ago

Sorry for raising this old thread, but could you please explain how to apply this patch to the 2.9 which is deployed via helm and not helm classic?