digitalocean / digitalocean-cloud-controller-manager

Kubernetes cloud-controller-manager for DigitalOcean (beta)
Apache License 2.0
527 stars 149 forks source link

Upgrading Helm charts with ingresses may delete the connected LoadBalancer #363

Closed liarco closed 3 years ago

liarco commented 3 years ago

I recently run into a situation that caused me troubles:

Having kubernetes.digitalocean.com/load-balancer-id: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx applied to the LoadBalancer resource didn't help.

A deleted LoadBalancer means that I cannot recover my public IP in any way and clients with services running on my cluster have to update their DNS records once I assign a brand new LoadBalancer. This causes downtimes and turns a "simple" upgrade task into a risky operation.

Is there any workaround to this? I thought I could use service.kubernetes.io/do-loadbalancer-disown: "true" but I would have to run helm upgrade ... to set that annotation too so it may lead to the same result, am I wrong?

Am I missing some best practice or is this something that DO should address in some way?

Thank you for your time.

timoreimann commented 3 years ago

👋 is there a way to keep Helm from deleting the resource on an upgrade? It seems a bit odd to me that Helm is doing that, many CCM implementations I'm aware of will do the same thing and delete the LB.

liarco commented 3 years ago

Thank you @timoreimann that was not my first upgrade command, but I've never had that issue before, I guess it may be due to some breaking changes between the resources from different chart versions... in that case recreating the resource may be the only option for helm.

The only feature I'm aware of is "helm.sh/resource-policy": keep but I'm not sure if it must be set inside the chart template of if I can set it as a chart value.

BTW...

The annotation "helm.sh/resource-policy": keep instructs Helm to skip deleting this resource when a helm operation (such as helm uninstall, helm upgrade or helm rollback) would result in its deletion. However, this resource becomes orphaned. Helm will no longer manage it in any way. This can lead to problems if using helm install --replace on a release that has already been uninstalled, but has kept resources.

So I guess this means that there's no way to relink the resource once helm thinks it has been deleted (even after an upgrade).

timoreimann commented 3 years ago

You could set service.kubernetes.io/do-loadbalancer-disown: "true" which would stop the deletion from being processed during helm upgrade. However, in that case you'd also need to configure the Service correctly afterwards to make sure the LB gets re-attached to the resource, i.e., basically implement what is described here in a Helm-compatible way. I can't tell for sure if that's feasible since I'm not super familiar with Helm myself.

I wonder if you can take this to the Helm community to ask what the best way forward here is.

liarco commented 3 years ago

You are right, that's what I always tried to do (and I thought it worked) but unfortunately adding service.kubernetes.io/do-loadbalancer-disown: "true" involves running helm upgrade too.

Since you told me that the DOCCM behavior is the same as many others out there I agree with you that I should find a solution from the "helm-side". I'm gonna reach to the Helm community for some help. I will update this issue as a reference for other users if I find a good solution.

Thank you for your help!

timoreimann commented 3 years ago

Thanks @liarco, would love to hear how this plays out eventually. Good luck. 🍀

timoreimann commented 3 years ago

Closing out the ticket. The reference for the upstream Helm ask is right above.