Upgrade Issue with bringing in kube-registry-proxy

vdice commented 7 years ago

In order to switch over from our in-house registry-proxy to the official/upstream kube-registry-proxy (as original PR https://github.com/deis/workflow/pull/734 proposed) we will need to sort out the following issue when upgrading.

v2.12.0 release candidate testing showed that after a Workflow install that uses the in-house variant of deis-registry-proxy (say, v2.11.0), when one goes to upgrade (helm upgrade luminous-hummingbird workflow-staging/workflow --version v2.12.0), although the deis-registry-proxy pod appears to have been removed, the new luminous-hummingbird-kube-registry-proxy sometimes does not appear due to a host port conflict:

 $ helm ls
NAME                    REVISION    UPDATED                     STATUS      CHART               NAMESPACE
luminous-hummingbird    4           Wed Mar  8 14:01:02 2017    DEPLOYED    workflow-v2.12.0    deis

$ kd get po,ds
NAME                                        READY     STATUS    RESTARTS   AGE
po/deis-builder-574483744-qnf44             1/1       Running   0          24m
po/deis-controller-3953262871-jqkmd         1/1       Running   2          24m
po/deis-database-83844344-m5x4x             1/1       Running   0          24m
po/deis-logger-176328999-d7fxc              1/1       Running   9          1h
po/deis-logger-fluentd-0hqfs                1/1       Running   0          1h
po/deis-logger-fluentd-drfh6                1/1       Running   0          1h
po/deis-logger-redis-304849759-nbrdp        1/1       Running   0          1h
po/deis-minio-676004970-g2bj9               1/1       Running   0          1h
po/deis-monitor-grafana-432627134-87b1z     1/1       Running   0          24m
po/deis-monitor-influxdb-2729788615-q67f9   1/1       Running   0          25m
po/deis-monitor-telegraf-6q562              1/1       Running   0          1h
po/deis-monitor-telegraf-rzwnv              1/1       Running   6          1h
po/deis-nsqd-3597503299-94nhx               1/1       Running   0          1h
po/deis-registry-756475849-v0rmw            1/1       Running   0          24m
po/deis-router-1001573613-mk07g             1/1       Running   0          13m
po/deis-workflow-manager-1013677227-kh5vt   1/1       Running   0          25m

NAME                                          DESIRED   CURRENT   READY     NODE-SELECTOR   AGE
ds/deis-logger-fluentd                        2         2         2         <none>          1h
ds/deis-monitor-telegraf                      2         2         2         <none>          1h
ds/luminous-hummingbird-kube-registry-proxy   0         0         0         <none>          24m

 $ kd describe ds luminous-hummingbird-kube-registry-proxy
Name:       luminous-hummingbird-kube-registry-proxy
Image(s):   gcr.io/google_containers/kube-registry-proxy:0.4
Selector:   app=luminous-hummingbird-kube-registry-proxy
Node-Selector:  <none>
Labels:     chart=kube-registry-proxy-0.1.0
        heritage=Tiller
        release=luminous-hummingbird
Desired Number of Nodes Scheduled: 0
Current Number of Nodes Scheduled: 0
Number of Nodes Misscheduled: 0
Pods Status:    0 Running / 0 Waiting / 0 Succeeded / 0 Failed
Events:
  FirstSeen LastSeen    Count   From            SubObjectPath   Type        Reason      Message
  --------- --------    -----   ----            -------------   --------    ------      -------
  25m       25m     2   {daemonset-controller }         Normal      FailedPlacement failed to place pod on "k8s-agent-fbf26383-0": host port conflict
  25m       25m     2   {daemonset-controller }         Normal      FailedPlacement failed to place pod on "k8s-master-fbf26383-0": host port conflict

bacongobbler commented 7 years ago

let's see if we can distill this into a base case which we can hopefully ship a PR and functional test upstream to helm.

vdice commented 7 years ago

It is a possibility that this is due to a k8s regression (been running v1.5.x in my testing); perhaps related: https://github.com/kubernetes/kubernetes/issues/23013

vdice commented 7 years ago

Adding this to the v2.15 milestone. We'll want to re-try this on a v1.6.x cluster. As it stands, we've added deis/registry-proxy back into CI as features have come in with the Workflow v2.14 milestone.

Cryptophobia commented 6 years ago

This issue was moved to teamhephy/workflow#27

deis / workflow

Upgrade Issue with bringing in kube-registry-proxy #766