awslabs / landing-zone-accelerator-on-aws

Deploy a multi-account cloud foundation to support highly-regulated workloads and complex compliance requirements.
https://aws.amazon.com/solutions/implementations/landing-zone-accelerator-on-aws/
Apache License 2.0
547 stars 436 forks source link

Failure to delete ServiceLinkedRole AWSServiceRoleForCodeStarNotifications when updating to v1.4.3 #221

Closed KashifSaadat closed 1 year ago

KashifSaadat commented 1 year ago

Describe the bug

I updated my solution from LZA v1.4.1 to v1.4.3. When the pipeline AWSAccelerator-Installer triggered and the build project AWSAccelerator-InstallerProject ran, it succeeded but with errors (the deletion was attempted 3 times and then skipped with an overall success on rollout):

AWSAccelerator-PipelineStack-<MANAGEMENT-ACCOUNT-ID>-eu-west-2 | 10:09:11 AM | DELETE_FAILED        | AWS::IAM::ServiceLinkedRole                  | PipelineAWSServiceRoleForCodeStarNotificationsDA052A10 Resource of type 'AWS::IAM::ServiceLinkedRole' with identifier 'AWSServiceRoleForCodeStarNotifications' has a conflict. Reason: SLR [AWSServiceRoleForCodeStarNotifications] is in use by other resources: [[RoleUsageType(Region=eu-north-1, Resources=[arn:aws:codestar-notifications:eu-west-2:<MANAGEMENT-ACCOUNT-ID>:notificationrule/<NOTIFICATION-RULE-ID-1>, arn:aws:codestar-notifications:eu-west-2:<MANAGEMENT-ACCOUNT-ID>:notificationrule/<NOTIFICATION-RULE-ID-2>])]].

PipelineAWSServiceRoleForCodeStarNotificationsDA052A10 Resource of type 'AWS::IAM::ServiceLinkedRole' with identifier 'AWSServiceRoleForCodeStarNotifications' has a conflict. Reason: SLR [AWSServiceRoleForCodeStarNotifications] is in use by other resources: [[RoleUsageType(Region=eu-north-1, Resources=[arn:aws:codestar-notifications:eu-west-2:\<MANAGEMENT-ACCOUNT-ID>:notificationrule/\<NOTIFICATION-RULE-ID-1>, arn:aws:codestar-notifications:eu-west-2:\<MANAGEMENT-ACCOUNT-ID>:notificationrule/\<NOTIFICATION-RULE-ID-2>])]].

To Reproduce

  1. Have your LZA solution at v1.4.1 and ensure the pipeline etc is all up to date.
  2. Follow the instructions at https://docs.aws.amazon.com/solutions/latest/landing-zone-accelerator-on-aws/update-the-solution.html, keeping parameters the same but changing the release branch to the latest (i.e. release/v1.4.3)
  3. Follow the logs for the pipeline and build project

Expected behavior

It should run cleanly and succeed with no errors or warnings.

Solution Details

KashifSaadat commented 1 year ago

Just to add, a similar failure has occurred as the install progressed, with the pipeline AWSAccelerator-Pipeline and build project AWSAccelerator-ToolkitProject (at the Accounts step):

AWSAccelerator-AccountsStack-<ACCOUNT-ID>-us-east-1 | 47/52 | 10:34:52 AM | DELETE_FAILED        | AWS::IAM::ServiceLinkedRole     | GuardDutyServiceLinkedRole Resource of type 'AWS::IAM::ServiceLinkedRole' with identifier 'AWSServiceRoleForAmazonGuardDuty' has a conflict. Reason: SLR deletion failed. status: FAILED, reason: DeletionTaskFailureReasonType(Reason=Amazon GuardDuty is enabled in one or more regions. Disable GuardDuty in all regions before attempting to delete this role., RoleUsageList=[]).
AWSAccelerator-AccountsStack-<ACCOUNT-ID>-us-east-1 | 47/52 | 10:35:11 AM | DELETE_FAILED        | AWS::IAM::ServiceLinkedRole     | SecurityHubServiceLinkedRole Resource of type 'AWS::IAM::ServiceLinkedRole' with identifier 'AWSServiceRoleForSecurityHub' has a conflict. Reason: SLR deletion failed. status: FAILED, reason: DeletionTaskFailureReasonType(Reason=AWS Security Hub is enabled in one or more regions. Disable Security Hub in all regions before attempting to delete this role., RoleUsageList=[]).
AWSAccelerator-AccountsStack-<ACCOUNT-ID>-us-east-1 | 47/52 | 10:35:12 AM | DELETE_FAILED        | AWS::IAM::ServiceLinkedRole     | AccessAnalyzerServiceLinkedRole Resource of type 'AWS::IAM::ServiceLinkedRole' with identifier 'AWSServiceRoleForAccessAnalyzer' has a conflict. Reason: SLR deletion failed. status: FAILED, reason: DeletionTaskFailureReasonType(Reason=IAM Access Analyzer is enabled in one or more regions in your AWS organization. Ask your administrator to delete all analyzers in all regions for your organization before attempting to delete this role., RoleUsageList=[]).
nagmesh commented 1 year ago

Hello @KashifSaadat,

Thank you for reaching out to us.

There has been a recent change in Cloudformation resource AWS::IAM::ServiceLinkedRole behavior. Previously, if resource was mentioned in the template and it already exists then the stack would proceed. Recently, if the resource is mentioned then stack fails with ResourceAlreadyExist exception. To make the service linked role idempotent, we changed the creation process and manage it via a custom resource. If the service linked role exists nothing is done, if not a service linked role is created. However, upon cleanup the AWS::IAM::ServiceLinkedRole will try to get deleted and throw error like you showed above for 3 times before the stack proceeds. Please ignore this error for now. This only means that the service linked role is being used and cannot be deleted. The stack should complete in a green state (create_complete, update_complete etc). The problem you need to look out for is a silent delete as mentioned here. In this case, the service linked role is not created by custom resource as it exists but later deleted on cleanup. Then you will have to rerun the pipeline so resource gets recreated.

I hope this was helpful! I will be closing this issue, but please feel free to follow-up or open new issues as you have additional questions. Thank you for your interest in Landing Zone Accelerator!

KashifSaadat commented 1 year ago

Hi @nagmesh, thank you for the detailed response, that's good to know!