digitalocean / digitalocean-cloud-controller-manager

Kubernetes cloud-controller-manager for DigitalOcean (beta)
Apache License 2.0
523 stars 147 forks source link

Specify retry duration when DO LB is in a being created ("new" state) #647

Closed llDrLove closed 1 year ago

llDrLove commented 1 year ago

[Context] When an LB is being created by CCM, the controller waits for the operation to complete by retrying the create continuously. This leads to an error by the LB API as long as the LB is still being provisioned (with the error from the API saying something along the lines of "LB cannot be updated while create is pending"), which in turn causes CCM to back off at exponentially increasing waiting times. This has the effect that by the time the LB has actually finished provisioning, CCM is in such deep back-off that it takes an unreasonable long time for it to recognize the LB as ready. Anecdotal data indicates that we lose up to a few single-digit minutes until CCM can retrieve the LB IP address and allow the LB to be used in Kubernetes.

Kubernetes 1.28 added support for returning a RetryError type specifying a custom retry period for CCM to wait. This PR leverages the new error type in our CCM's EnsureLoadBalancer method implementation by identifying when an LB is new and returning such a type with a relatively short 15 seconds retry time between each reconciliation.

llDrLove commented 1 year ago

Added the Changelog entry.