digitalocean / clusterlint

A best practices checker for Kubernetes clusters. 🤠
Apache License 2.0
542 stars 45 forks source link

Cluster upgrade issue with cert manager #100

Closed simonkotwicz closed 3 years ago

simonkotwicz commented 3 years ago

Anyone using cert manager currently will get this error when upgrading their cluster:

There are issues that will cause your pods to stop working. We recommend you fix them before upgrading this cluster. Validating webhook is configured in such a way that it may be problematic during upgrades. Mutating webhook is configured in such a way that it may be problematic during upgrades.

Should these be be marked as errors since api group rules are specified? https://github.com/jetstack/cert-manager/blob/87989dbfe35bed99a9e031c71ad3a7d49030a8bf/deploy/charts/cert-manager/templates/webhook-mutating-webhook.yaml#L26-L28 https://github.com/jetstack/cert-manager/blob/87989dbfe35bed99a9e031c71ad3a7d49030a8bf/deploy/charts/cert-manager/templates/webhook-validating-webhook.yaml#L36-L38

adamwg commented 3 years ago

We recently fixed this in clusterlint (#99), but that fix hasn't made it to DOKS yet, so you will still see the false positive errors in the DO control panel. We're working on rolling that fix into the DOKS clusterlint integration right away, since it is a common source of confusion for users.

simonkotwicz commented 3 years ago

My bad, I should have checked recent commits. I guess this issue can be closed or wait until it has been rolled out to DOKS.

larshp commented 3 years ago

I have the same issue on DOKS, image

devurandom commented 3 years ago

I also still run into this issue.

Following the advice in https://github.com/digitalocean/clusterlint/blob/master/checks.md#admission-controller-webhook-replacement and looking at https://github.com/jetstack/cert-manager/blob/v1.1.0/deploy/charts/cert-manager/templates/webhook-validating-webhook.yaml#L25-L30 I set cert-manager.io/disable-validation: "true" on my cert-manager namespace. According to https://github.com/digitalocean/clusterlint/blob/95e7d57b51966863fbff40bce6adc5004d836b1e/checks/doks/admission_controller_webhook_replacement.go#L81-L94 I would expect that clusterlint now stops complaining. But it does not.

In any case that would still leave https://github.com/jetstack/cert-manager/blob/v1.1.0/deploy/charts/cert-manager/templates/webhook-mutating-webhook.yaml which does not have have such a namespaceSelector.

cert-manager tracks this at https://github.com/jetstack/cert-manager/issues/2971

devurandom commented 3 years ago

This issue appears to have been resolved. I just received an email that my cluster would be automatically updated and the error messages are gone from the "Version Upgrade" dialogue in the DigitalOcean UI.

varshavaradarajan commented 3 years ago

@devurandom - thanks, yes, we recently released a new version and vendored that on DOKS. That contains the fix: #99. Closing this.

larshp commented 3 years ago

I still have two errors left, is this as expected?

image

larshp commented 3 years ago

the comments from https://github.com/digitalocean/clusterlint/issues/101 helped, I updated cert-manager via helm to latest, and now there are no errors

following the tutorial in https://www.digitalocean.com/community/tutorials/how-to-set-up-an-nginx-ingress-on-digitalocean-kubernetes-using-helm makes it difficult to upgrade the kubernetes cluster, as it uses a old cert-manager version