Open chkothe opened 3 years ago
Btw I cleaned up the issue description above, should now be much clearer
@chkothe Did you test this change on a stack that had been created with the current behaviour and then try applying an update after your change had been applied? I have hit this exact same issue when dealing with raw CloudFormation templates (not SAM) but I would have thought that creating an explicit stage with the same name would cause stack updates to fail if they go from the old behaviour to this behaviour (with the stage in it)
Hey, yes, we did (going from chalice 1.23 to this PR).
So if you apply this fix to an existing stack, what it does is that it replaces a literal string (in a base path mapping resource) by a reference to a stage resource, which in turn resolves to the same string as before.
And the stage resource that's referenced has already been an implicit part of the SAM template all along, so no physical resource gets added, removed, or updated by this. Actually, when you apply the change, the deployed base path mapping wouldn’t even be updated either, since nothing in the physical resource actually changed (same string as before) -- the only thing that changes is the dependency order that CFN is required to adhere to.
Therefore, if I was worried about anything, it’d be that CFN gives its trademark 'error' that “nothing actually changed” when doing a deploy with the fix applied. Now in the context of chalice, the version update would incur some other material changes in the stack (namely the lambda code package or the dependencies layer image), so that particular hiccup wouldn't happen.
@chkothe Right, so there is a Stage resource being created but it's just not being made an explicit dependency. Looks like a pretty straightforward improvement to me, then.
The issue I was referring to happens in the case where you create a deployment with a string name but don't have a Stage resource in the stack, but then later try to use a Stage resource with the same name as whatever was set before. It seems like the SAM resource is doing the right thing in that regard, though.
@chkothe thank you for the solution! <3
Any updates on the revision of this ticket? This was a serious problem for me and after using the code in PR everything worked as expected.
It would be awesome if we could include this fix in the new release - it's small and probably quick to review.
Update:
After applying this solution, I realized that there are two stages in my Gateway API instead of just one:
This wasn't supposed to happen, right @chkothe?
@jonathanabila sure np. That issue is most likely unrelated (the PR doesn't add or rename any resources), e.g., could be a config change while iterating on the deployment.
@jonathanabila sure np. That issue is most likely unrelated (the PR doesn't add or rename any resources), e.g., could be a config change while iterating on the deployment.
Ok - I'll try again in another environment - thanks again!
@jonathanabila sure np. That issue is most likely unrelated (the PR doesn't add or rename any resources), e.g., could be a config change while iterating on the deployment.
Hi @chkothe I did another deployment to create a new environment and got the same error:
Do you have any information I can give you to help with the debugging process?
Hi Johnathan, if you want to be 120% sure that it's not caused by this PR, you might also try a different way that comes down to the same end result (delaying the creation of the custom domain mapping until after the stage has been created): https://github.com/chkothe/chalice/commit/b464c7f9b3826494ae7ac63ce7463bced20f546e
... that's the commit for an alternative PR I had made with the same goal (namely fixing the resource creation order), which works by adding a DependsOn clause to the CF template instead of a cross-reference. Again, none of them should affect naming in any way, only the order in which resources are created, but surely there's a lot going on at deploy time and the latter definitely makes it more clear what's changing (although unfortunately it's not the prettiest depends-on due to the way SAM names its resources). So if you were to use that, and yet you still get mystery stages appearing every now and then in your CF stack then it'd prove pretty conclusively that it's caused by some other glitch (whatever that may be).
Is there a reason chkothe 's commit can't be merged? I hit this issue today. (deploying using cdk) I can work around by deploying the stack first without the custom mapping and then adding it later but its a pain
any update of this ?
any update on this? I am facing same issue can't deploy chalice using cdk, getting this error
Resource handler returned message: "Invalid stage identifier specified (Service: ApiGateway, Status Code: 400, Request ID: b3ecd546-1030-4f67-bd99-b32bd282c82c)" (RequestToken: dda5727b-6185-e7db-25c2-a9210135ff7a, HandlerErrorCode: InvalidRequest)
A work around when using CDK.
from typing import Any, cast
import aws_cdk as cdk
from aws_cdk import aws_apigateway as apigatewayv1
from aws_cdk import aws_apigatewayv2 as apigateway
from constructs import Construct
from chalice.cdk import Chalice
class ChaliceNestedStack(cdk.NestedStack):
def __init__(
self,
scope: Construct,
construct_id: str,
/,
*,
source_dir: str = 'chalice',
stage_config: dict[str, Any],
) -> None:
super().__init__(scope, construct_id)
self.app = Chalice(
self,
'App',
source_dir=source_dir,
stage_config=stage_config,
)
rest_api = cast(apigateway.CfnApi, self.app.get_resource('RestAPI'))
custom_domain = cast(apigateway.CfnDomainName, self.app.get_resource('ApiGatewayCustomDomain'))
base_path_mapping = cast(apigatewayv1.CfnBasePathMapping, self.app.get_resource('ApiGatewayCustomDomainMapping'))
custom_domain.add_dependency(rest_api)
base_path_mapping.add_dependency(custom_domain)
Almost 3 years and no fix?
The way the
api_gateway_custom_domain
feature is represented in the CFN template generated bychalice package
is currently prone to a race condition, but it’s relatively easy to fix.Issue Currently, the
ApiCustomDomainMapping
resource (I guess that’s of typeAWS::ApiGateway::BasePathMapping
) doesn’t wait until theRestAPI
’s Stage resource has been created -- however, if that happens, its creation fails (as in the screenshot below).That stage resource isn’t explicit in the template generated by Chalice because it’s only injected by the Serverless transform, and it’s then named
RestAPI<stagename>Stage
(in my screenshot it’s calledRestAPIv1Stage
because my stage is calledv1
); that's as per SAM docs.Fortunately the
RestAPI
(theAWS::Serverless::Api
) has an attribute that conveniently serves to help with exactly that issue, namelyStage
(i.e.RestAPI.Stage
). That expression gets replaced by the SAM transform by the actual logical name of the stage resource that it injects. That is, the!Ref RestAPI.Stage
becomes!Ref RestAPIv1Stage
if the stage were calledv1
(also mentioned on the same SAM doc page where it says Referenceable property). And finally the!Ref RestAPIv1Stage
resolves to the name of the stage (herev1
), which is documented here under Return Values. So the technique comes down to using!Ref RestAPI.Stage
instead of the literal stage name (v1
for me), which creates an explicit dependency of the base path mapping on the stage resource, and thus delays its creation until the stage has been created. That technique is also described in this git issue comment. With that fix, the stack creation succeeds as in the below screenshot.At least one person commented that it the old (broken) behavior be non-deterministic, although in our stack the old version seems to fail pretty much every time (the new version always succeeds).
I made a PR that implements that, it’s basically 1 line and another line in the unit test. (for the record, one could also use a
DependsOn
and I had tried that too and it works too, but it’s addressed more concisely by simply referencing the .Stage attribute).Edit: Concise rewrite of the issue description so it’s easier to follow.