We will retry on a failure, but if there are a large number of nodes trying to hit the API again, they can be throttled for a second time.
Rinse. Repeat.
This could mean that for larger number of nodes, they end up in a loop as they can never get a response from the API and never actually form the cluster.
It'd be great if we could add a jitter aspect to the retry so the nodes don't all end up retrying at the same time.
Currently if a large number of nodes need to hit the AWS endpoint for whatever reason (eg discovery), they can be impacted by [API throttling].(http://docs.aws.amazon.com/AWSEC2/latest/APIReference/query-api-troubleshooting.html#api-request-rate).
We will retry on a failure, but if there are a large number of nodes trying to hit the API again, they can be throttled for a second time. Rinse. Repeat. This could mean that for larger number of nodes, they end up in a loop as they can never get a response from the API and never actually form the cluster.
It'd be great if we could add a jitter aspect to the retry so the nodes don't all end up retrying at the same time.