atlassian / escalator

Escalator is a batch or job optimized horizontal autoscaler for Kubernetes
Apache License 2.0
646 stars 58 forks source link

Terminate orphaned instances from CreateFleet requests #194

Closed haugenj closed 3 years ago

haugenj commented 3 years ago

When using CreateFleet to provision new instances, there's a chance that instances are returned from the call but then not attached to the ASG. Instances that are not attached to the ASG will never be terminated by Esclator, so they'll stay running in the user's account indefinitely. This happens in two cases:

  1. When not all instances are in a "ready" state before the instance ready timeout is hit. In this case the full number of instances become orphaned
  2. When a call to AttachInstances fails. In this case any instances that haven't been attached to the instance become orphaned (happens in batches of 20)

Here's a screenshot of this happening to me for this scenario:

This change handles these cases and calls TerminateInstances with the list of instances that haven't been attached to the ASG. To try and mitigate the issue of these failures happening indefinitely, say if the timeout value is too low, a fatal is logged if this happens three times in a row.