gardener / machine-controller-manager

Declarative way of managing machines for Kubernetes cluster
Apache License 2.0
256 stars 117 forks source link

Switch to exponential backoff while creating/deletion machines #483

Open hardikdr opened 4 years ago

hardikdr commented 4 years ago

What would you like to be added: On failure of machine creation or deletion requests, MCM constantly tries to create or delete the machine-objects. This could cause a heavy load on control-cluster's API-server, and exhaust the API rate-limits of cloud-provider. We should exponentially back-off on the failure of requests.

Why is this needed:

prashanth26 commented 4 years ago

/assign @hardikdr @prashanth26 /priority blocker

hardikdr commented 4 years ago

/priority normal We implemented the constant backoff here #525. We should consider looking at a more sophisticated exponential backoff mechanism, a proposal would be nice. I mainly see 2 options,

  1. Backoff at the queue. An attempt to machine-set queue: https://github.com/gardener/machine-controller-manager/pull/510
  2. Backoff inside the reconcile function.

cc @zuzzas

zuzzas commented 4 years ago

Thanks to https://github.com/gardener/machine-controller-manager/pull/525 we can now attach a RateLimitingInterface to the queue, and throttle Machines in CrashLoopBackoff.

  1. I'd take the backoff_manager concept from here.
  2. Create a throttling-by-CrashLoopBackoff function here.
  3. And attach the resulting RateLimitingInterface to the queue here.

Then, there's a matter of replacing Adds with AddRateLimiteds to ensure that our new RateLimiter is being triggered.

prashanth26 commented 3 years ago

/title Switch to exponential backoff while creating/deletion machines