aws-solutions / network-orchestration-for-aws-transit-gateway

The Network Orchestration for AWS Transit Gateway solution automates the process of setting up and managing transit networks in distributed AWS environments. It creates a web interface to help control, audit, and approve (transit) network changes.
https://aws.amazon.com/solutions/implementations/serverless-transit-network-orchestrator/
Apache License 2.0
110 stars 46 forks source link

Update Spoke template from v3.2.1 to v3.3.1 failed #96

Closed jwiechmann closed 10 months ago

jwiechmann commented 11 months ago

Describe the bug

Update Spoke template from v3.2.1 to v3.3.1 failed

To Reproduce

Update Spoke template from v3.2.1 to v3.3.1

Expected behavior

ServiceLinkedRole is used by existing Attachment and cannot be deleted.

Please complete the following information about the solution:

To get the version of the solution, you can look at the description of the created CloudFormation stack. For example, "(SO0009) - The AWS CloudFormation template for deployment of the aws-centralized-logging. Version v1.0.0". You can also find the version from releases

Screenshots

image

Additional context

CloudFormation Error: Resource of type 'AWS::IAM::ServiceLinkedRole' with identifier 'AWSServiceRoleForVPCTransitGateway' has a conflict. Reason: SLR [AWSServiceRoleForVPCTransitGateway] is in use by other resources: [[RoleUsageType(Region=eu-central-1, Resources=[tgw-attach-0c74b850d9f6e7945])]].

IAM Error: AWSServiceRoleForVPCTransitGateway Deletion failed.

groverlalit commented 11 months ago

In v3.3.1, we removed the TGW Service-Linked Role (SLR) from the spoke stack to avoid the CloudFormation error shared above. The reason the AWSServiceRoleForVPCTransitGateway can't be deleted because there is an existing TGW attachment. This is by design to avoid issues with TGW attachment creation workflow.

The CloudFormation stack will attempt to delete the SLR resource 3 times and give up. The update stack will complete. You can ignore the "Deletion Failed" error in IAM console. In this scenario, you don't need to deploy network-orchestration-spoke-service-linked-roles.template as the SLR already exists.

See screenshot below with results. screenshot-update-spoke-stno-1

jwiechmann commented 11 months ago

Hi Lalit, Thank you for your quick response. The "Delete failed" error in the IAM console is making it difficult for us to roll out the new spoke template to all of our existing accounts in the organization via StackSet. Can we use the Spoke template version v3.3.0 instead? Because we don't use the multi-region deployments. Or can you provide a boolean parameter for that? BTW the old option CreateServiceRoleForVPCTransitGateway did it exactly: Skip or Create! BR Jens

groverlalit commented 11 months ago

The update stackSet for spoke stack should complete as the stack will continue to reach "UPDATE_COMPLETE". The new spoke SLR stack can be deployed using higher fault tolerance. Alos, note that TGW SLR can also be created automatically by VPC during attachment creation but only in new accounts (with no existing TGW SLR).

I would not recommend using v3.3.0 as it will impact your upgrade path for future releases.

The option to CreateServiceRoleForVPCTransitGateway CFN parameter worked for your use case but it was not a viable option for multi-region deployments.

groverlalit commented 10 months ago

Closing this issue in to no activity for over a month. Please reopen this issue if needed. Thanks