Open mkielar opened 5 years ago
Hi @mkielar,
There are backoff in tail_stack_events
so cfn-cli
won't be throttled when it's working on a reasonably sized "nested stack". Are you deploying 20 stacks in a single account? AWS has a lot of throttling applied here and there at account level. Even call to DescribeStackEvents
is disabled you are very likely to hit another transparent wall somewhere else (eg: running out of instance type limits, or throttled when creating too many DynamoDB tables to fast).
Still, I think 2 is a reasonable workaround as disable the stack events will greatly reduce call to Cloudformation APIs. But it may not work as expected as the "wait until stack deployment complete" features internally uses Waiter and it polls CloudFormation API:
def wait(self, **kwargs):
........
while True:
response = self._operation_method(**kwargs)
num_attempts += 1
.........
I would like to know what kind of resources you are creating in the template? (eg: DDB table, EC2 instance...etc)
Hi,
what I deploy is more or less:
This is one main stack that consists of 7 to 10 nested stacks depending on configuration (some nested stacks are only deployed on specific Conditions). This, times 20. As of now, the only throttling we observe is caused by Cloudformation API. Once started, all the stacks deploy properly.
The 20 is increasing as the nature of the platform I'm building is to allow standardized deployment of tools that serve different business logic, but have the same, standard APIs. Which means we're going to go from 20 to much more within some time.
What I'd appreciate though is pt.1 with either pt.2 or pt.3, as the first one actually gives me control on the number of retries, and the other two minimize the risk of throttling.
Alternatively, I could use Jenkins retry
step, if I could identify that the cfn-init
failed due to throttling when trying to executing the stack, or failed due to CF stack failing. That should be possible if I could differentiate cfn-cli
exit codes on those occations. Do you have any docs on exit codes of cfn-cli
?
This is an effect of #59. I have around 20 stacks that are currently ran within
parallel
Jenkins Pipieline. They have no cross-dependencies, so it's way faster to run them concurrently. This means they all start more or less at the same time, and they start failing because of the Throttling issue on Cloudformation API.The example log:
It seems
cfn-cli
handles throttling issues whenDescribeStackEvents
is called for logging, but that's it. To be even worse, this exception is thrown bybotocore
after it already attemptedmax_retries
times, with an expotential delay handler (i think) and all of them failed.There seems to be no proper way out of this, but I'd like you to consider three options:
max_retries
value (example)DescribeStackEvents
calls, while still waiting for the stack to finish (although not sure if that would actually reduce the number of CF API Requests)DescribeStackEvents
. This is a larger topic, but: 3.1.cfn-cli
could either use SQS address provided by parameter (assuming users set up everything themselves), or 3.2. better,cfn-cli
could actually provision SNS/SQS for itself and then use it (assuming it's run with a profile that allows for this).What do you think?