aws-solutions / network-orchestration-for-aws-transit-gateway

The Network Orchestration for AWS Transit Gateway solution automates the process of setting up and managing transit networks in distributed AWS environments. It creates a web interface to help control, audit, and approve (transit) network changes.
https://aws.amazon.com/solutions/implementations/serverless-transit-network-orchestrator/
Apache License 2.0
110 stars 46 forks source link

Ability to update from within AWS cfct #97

Closed randyspainhower closed 3 months ago

randyspainhower commented 10 months ago

Is your feature request related to a problem? Please describe. We have stno (network orchestration) v2.0.0 deployed within Customizations for AWS Control Tower (CfCT) as a custom template, this was deployed by AWS Proserve a few years back, the hub is deployed to one transit account 2 regions (us-west-2,us-east-2). The spoke is deployed to all accounts in the org to the same 2 regions. When we tried to upgrade to v3.3.1 through cfct, it would error no matter what we did, we started with the default template and then tried to adjust the template per each error, got passed a few errors but it became an all day event and was not an easy lift.

Describe the feature you'd like upgrade to v3.3.1+ through cfct without lots of edits to the template. Documentation about using CfCT as a deployement/update tool as this seemed to be a common Proserve practice at the time.

Additional context We ended up reverting back to v2.0.0 because we never got passed the point of having to fail forward. This is a production environment and we don't want to blow it up.

groverlalit commented 10 months ago

Thanks for bringing this to our attention. CfCT use stack sets to deploy the stacks. It would be great if we can review the errors you observed in the CFN events when deployed via stack sets. Assumption: These issues are decoupled from Control Tower setup and CfCT. We may be able to set the CFN parameters to avoid specific issues.

If you prefer, you can open a support case to make it easy to share details from the CfCT manifest file and other details.

randyspainhower commented 10 months ago

so it would fail in the step function state machine for cfct in the deploying master account. the first error we got was related to the ListofCustomCidrBlocks pattern. it didn't matter if I had spaces or not after the comma it with error either way: "RetryDeleteFlag": false, "us-west-2": "Parameter 'ListOfCustomCidrBlocks' must match pattern (^$|^(([0-9]{1,3}\.){3}[0-9]{1,3}\/\d{1,2})(, (([0-9]{1,3}\.){3}[0-9]{1,3}\/\d{1,2}))$)", "us-east-2": "Parameter 'ListOfCustomCidrBlocks' must match pattern (^$|^(([0-9]{1,3}\.){3}[0-9]{1,3}\/\d{1,2})(, (([0-9]{1,3}\.){3}[0-9]{1,3}\/\d{1,2}))$)", "OperationStatus": "FAILED" }

I was able to get past that by commenting out the pattern requirement in the template. Then it would error at the service catalog app registry (I removed the request id and token id): "us-west-2": "ResourceLogicalId:Application, ResourceType:AWS::ServiceCatalogAppRegistry::Application, ResourceStatusReason:Resource handler returned message: \"'%VERSION%' is not a valid value for TagValue - it contains illegal characters (Service: ServiceCatalogAppRegistry, Status Code: 400, Request ID: )\" (RequestToken: , HandlerErrorCode: InvalidRequest).", "us-east-2": "ResourceLogicalId:Application, ResourceType:AWS::ServiceCatalogAppRegistry::Application, ResourceStatusReason:Resource handler returned message: \"'%VERSION%' is not a valid value for TagValue - it contains illegal characters (Service: ServiceCatalogAppRegistry, Status Code: 400, Request ID: )\" (RequestToken: , HandlerErrorCode: InvalidRequest)." }

I commented that out and then it errored at the ResourceLogicalId:TgwPeeringLambdaFunction: "OperationStatus": "FAILED", "us-west-2": "ResourceLogicalId:TgwPeeringLambdaFunction, ResourceType:AWS::Lambda::Function, ResourceStatusReason:Properties validation failed for resource TgwPeeringLambdaFunction with message:\n#/Code/S3Bucket: failed validation constraint for keyword [pattern].", "us-east-2": "ResourceLogicalId:TgwPeeringLambdaFunction, ResourceType:AWS::Lambda::Function, ResourceStatusReason:Properties validation failed for resource TgwPeeringLambdaFunction with message:\n#/Code/S3Bucket: failed validation constraint for keyword [pattern]." }

I can open a support case if thats the best way. The old 2.0.0 template was pretty similar besides the new resources added in the v3.3.1 template.

groverlalit commented 10 months ago

Hi @randyspainhower, Thanks for providing details on your experience.

I am not aware if any customization was made to the STNO stack but it seems that the stack in the GitHub repo is being used to upgrade the version. The reason is that the GitHub stack has %VERSION% and S3Key (use Mapping) and also refers to other variables. If the stack was customized then you need to use the Build steps to replace the variables with the values you provide.

You will not find these variables in the hub stack template that we host in our managed bucket. Implementation Guide Template Page.

In reference to ListOfCustomCidrBlocks parameter. The implementation guide defines it as required parameter. The reason we don't add default value of 0.0.0.0/1,128.0.0.0/1 to avoid internet to access APIs by default.

groverlalit commented 3 months ago

Resolving this issue. Please reopen if you have any questions. Thanks.