When using CreateFleet to provision new instances, there's a chance that instances are returned from the call but then not attached to the ASG. Instances that are not attached to the ASG will never be terminated by Esclator, so they'll stay running in the user's account indefinitely. This happens in two cases:
When not all instances are in a "ready" state before the instance ready timeout is hit. In this case the full number of instances become orphaned
When a call to AttachInstances fails. In this case any instances that haven't been attached to the instance become orphaned (happens in batches of 20)
Here's a screenshot of this happening to me for this scenario:
1 instance was requested but failed to be ready before the timeout
next loop 1 instance was requested and was ready in time and attached to the ASG
5 instances were requested but at least one wasn't ready in time
5 instances were requested and successfully attached to the ASG
When my job was finished half of the instances did not terminate
This change handles these cases and calls TerminateInstances with the list of instances that haven't been attached to the ASG. To try and mitigate the issue of these failures happening indefinitely, say if the timeout value is too low, a fatal is logged if this happens three times in a row.
When using CreateFleet to provision new instances, there's a chance that instances are returned from the call but then not attached to the ASG. Instances that are not attached to the ASG will never be terminated by Esclator, so they'll stay running in the user's account indefinitely. This happens in two cases:
Here's a screenshot of this happening to me for this scenario:
This change handles these cases and calls TerminateInstances with the list of instances that haven't been attached to the ASG. To try and mitigate the issue of these failures happening indefinitely, say if the timeout value is too low, a fatal is logged if this happens three times in a row.