Closed Thr3d closed 2 years ago
Hi @Thr3d,
Thank you for this submission and feature.
AWS CLI v1 and v2 have an embedded retry mechanism that covers the RequestLimitExceeded
handling you've included.
Isn't the AWS CLI throttling handler enough for this case? If yes, then I would suggest to add a sleep/timer between calls to prevent being throttled in first place, and then rely on the AWS CLI to handle the throttling and retries.
Thank you!
Somehow I missed that when originally looking into retries for this...thanks for pointing that out. It sounds like that should accomplish the same goal. Will test that out.
I would suggest to add a sleep/timer between calls to prevent being throttled in first place
The bucket is shared for the whole AWS account in that region rather than per instance which means some can't avoid being throttled by changing the clusters behavior. The CLI retry backoff should help handle this on its own too.
On a side note, should I make a new PR with just the curl --silent --show-error
change to remove the status spam?
The EC2 stonith resource uses AWS API acction
DescribeInstances
to check instance status. DescribeInstances has a max bucket size of 50 and refills at 10 tokens per second which is shared for the account per-Region. Because of this small bucket the stonith resource can easily hit RequestLimitExceeded errors and fail.PR addresses this by allowing DescribeInstances operations to retry until they succeed or the resource timeout is reached when hitting RequestLimitExceeded. It also removes the curl status spam from the logs while still logging any errors.
Current failure example: