capactio / hub-manifests

Holds Capact Hub manifests
Apache License 2.0
1 stars 8 forks source link

Cannot upgrade Capact on cluster with 1.20 K8s #14

Closed mszostok closed 3 years ago

mszostok commented 3 years ago

Description

Since Kubernetes 1.20, the evicted Pods stay in cluster with failed reason. For example:

cert-manager-webhook-c654fc8b-bqrnd                      1/1     Running     0          109m
cert-manager-webhook-c654fc8b-kqgwk                      0/1     Shutdown    0          20h
cert-manager-webhook-c654fc8b-st6kd                      0/1     Shutdown    0          16h
cert-manager-webhook-c654fc8b-x5kkq                      0/1     Shutdown    0          168m
ingress-nginx-controller-bf5464b58-g4ls8                 0/1     Shutdown    0          16h
ingress-nginx-controller-bf5464b58-rp2k2                 1/1     Running     0          109m

In our Capact upgrade Action we wait for cert-manager and ingress: https://github.com/capactio/hub-manifests/blob/32e1423df7afdfb2060dab99c972750159e1474c/manifests/implementation/capactio/capact/upgrade.yaml#L266-L281

Unfortunately we use kubectl wait --for=condition=ready pod -n capact-system --selector=... which takes into account all Pods also those in failed state which were evicted.

A better approach is to use a dedicated command which understands a given workflow, e.g.: kubectl rollout status sts/neo4j-neo4j-core --watch

Expected behavior

Ignore failed Pods which don't impact a given workload