Unidata / science-gateway

Unidata Science Gateway on the NSF Jetstream2 Cloud
https://science-gateway.unidata.ucar.edu/
BSD 3-Clause "New" or "Revised" License
19 stars 13 forks source link

redo cert-manager #914

Closed julienchastang closed 6 months ago

julienchastang commented 6 months ago

cc @ana-v-espinoza @zonca this is motivated by the fact that we are seeing some letsencrypt certs not autorenewing. I believe this has something to do with not needing and then needing again the letsencrypt deployment patch. Reapplying the deployment patch did not work. The only solution I have found is this "do over" workflow. The deployment patch I propose here is slightly different from what we have discussed before, but I am not sure that matters. I find this taints and toleration business most confusing.

ana-v-espinoza commented 6 months ago

Thanks Julien!

Just speculating a bit about why this may have happened. When the patch to the Deployment was made, the cert-manager Pods would have gotten destroyed and recreated on the main node of the cluster, which probably lost some kind of renewal data for the Certificate (or additionally the ClusterIssuer, or CertificateRequest) that wasn't destroyed and recreated.

I think destroying and recreating those resources as well is the right play in this scenario, but I also imagine that in the cases where the Deployment is patched before any additional resources are requested (i.e. before running bash install_jhub.sh), no additional actions are needed.

Again, just speculation.

zonca commented 6 months ago

thanks @julienchastang for notifying me about this. I agree with @ana-v-espinoza