Closed aygold92 closed 4 years ago
Until the platform is capable of actually respecting this information I don't think it should be part of the schema.
Until the platform is capable of actually respecting this information I don't think it should be part of the schema.
wanted to get the idea out there, but I will wait to merge this until its actually supported in the CFN platform
This shouldn't necessarily be static, right? With the DynamoDB GSI feature request, the length of time is highly dependent on the data and usage. So instead you might want the resource to be able to report an expected completion time dynamically, which could then also be reported to the user.
I think a static (maybe not required?) "maximum" could be useful, as a dynamic value would need extra data to be returned by the resource (increasing complexity for handler that are fine with a static value, eg. code that never returns InProgress).
For dynamic values it might be interesting to have both a "expected completion" and a "pause time" value that can be returned.
The expected completion could be exposed to the end user (a "completion percentage" might be another option for this). The pause time could indicate that CloudFormation should not invoke the provider for a certain amount of time. While CloudFormation waits the cost for going over the 30s threshold could be lower (ideally 0)
right, this is intended to signify the maximum timeout for a resource. A separate protocol could be developed using progress events to improve stabilization and possibly add dynamic timeouts (though dynamic timeouts could only shorten the timeout, and would still be subject to this maximum timeout)
the maximum is especially useful for the failure case. e.g. if you're developing a resource, right now if you get delete wrong you have to wait 2h until it times out. for normal operation, it doesn't affect the provisioning duration.
you can already "sleep" the handler via the callback (edit: although this doesn't change or pause the timeout). true, it would be nice to have a dynamic system where the first invocation has a timeout of a few minutes, and then can request/guesstimate further timeouts as required via the callback. but this is tricky w.r.t. credentials. i think the approach Andrew's proposing is a decent compromise for now?
This shouldn't necessarily be static, right? With the DynamoDB GSI feature request, the length of time is highly dependent on the data and usage. So instead you might want the resource to be able to report an expected completion time dynamically, which could then also be reported to the user.
The completion time also depends on some factors that cannot be known ahead of time. Both of these factors can change drastically at any point during DynamoDB GSI creation:
- The provisioned write capacity of the index
- Write activity on the main table during index builds
Both of these factors can change drastically at any point during DynamoDB GSI creation
This shows that not only must the timeout be dynamic, but that it also must be modifiable during the operation, not just dynamically reported at the beginning.
This PR is only implementing a default maximum time which CloudFormation should respect when terminating non-responsive, or incomplete handlers. This timeout is treated as a FAILED
operation. Hence, there is already dynamism built into the platform for any handler - if a handler decides that it has run longer than "it should" (e.g; "well this GSI creation is taking longer than I expected...") it can return a FAILED
response at any time during operation with some sort of TimeoutExceeded
exception or message.
I don't see any value in trying to introduce dynamic timeouts to either the schema or the interface, over the capabilities we already have.
So this timeout isn't a timeout for the operation, but for a specific invocation of the handler? That is, I set the timeout to 30 seconds, it's fine for the operation to take five minutes as long as my handler keeps returning a progress report in less than 30 seconds when it gets invoked (until the operation is complete)?
So this timeout isn't a timeout for the operation, but for a specific invocation of the handler? That is, I set the timeout to 30 seconds, it's fine for the operation to take five minutes as long as my handler keeps returning a progress report in less than 30 seconds when it gets invoked (until the operation is complete)?
No, this timeout is the max possible time the entire operation would take. So let's say you set the timeout to 5 minutes: CFN will invoke the handler and then listen to progress events for 5 minutes. If the operation is still IN_PROGRESS after 5 mnutes, CFN will stop listening and will terminate the operation.
this timeout is the max possible time the entire operation would take.
So then I don't understand @rjlohan's comment that the purpose is "terminating non-responsive, or incomplete handlers"
Does a successfully-returning handler reporting IN_PROGRESS count as non-responsive? If I don't explicitly set the timeout value in the schema, presumably CloudFormation has some default setting; is there a way to know programmatically that the timeout for an operation is the default and what the default is?
Does a successfully-returning handler reporting IN_PROGRESS count as non-responsive? If I don't explicitly set the timeout value in the schema, presumably CloudFormation has some default setting; is there a way to know programmatically that the timeout for an operation is the default and what the default is?
No, that'd fall into the responsive, but incomplete handler bucket. e.g; a type may periodically check that it has stabilized, but perpetually fail to do so. Then the timeout gives CloudFormation a signal that it should stop waiting.
responsive, but incomplete handler
I get it now. Thanks! I still am wondering how I can know programmatically what the default timeout is.
Issue #, if available:
Description of changes: This adds a new parameter
timeoutInMinutes
to thehandlerDefinition
, in order for providers to specify a custom stabilization timeout, to be used by CloudFormation. This could be used by the handler itself at some point to stop doing work.I used
timeoutInMinutes
since it is much simpler for handlers rather than specifying some ISO format, and I don't see any need to have any more granularity than thatBy submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.