crossplane-contrib / provider-upjet-aws

AWS Provider for Crossplane.
https://marketplace.upbound.io/providers/upbound/provider-family-aws/
Apache License 2.0
147 stars 124 forks source link

[Bug]: CloudFormation Stack fails to sync or get ready when it takes too long to deploy #1505

Open karloscarrijo opened 1 month ago

karloscarrijo commented 1 month ago

Is there an existing issue for this?

Affected Resource(s)

Resource MRs required to reproduce the bug

apiVersion: cloudformation.aws.upbound.io/v1beta1
kind: Stack
metadata:
  name: ct-stack
spec:
  forProvider:
    name: controltower-stack
    parameters:
      AuditAccountId: "xxxxxxxxxxxxxx"
      LogArchiveAccountId: "xxxxxxxxxxxxxx"
    region: us-east-1
    templateBody: |
      {
        "AWSTemplateFormatVersion": "2010-09-09",
        "Description": "AWS Control Tower Setup",
        "Parameters": {
          "AuditAccountId": {
            "Type": "String",
            "Description": "The ID of the Audit Account"
          },
          "LogArchiveAccountId": {
            "Type": "String",
            "Description": "The ID of the Log Archive Account"
          }
        },
        "Resources": {
          "ControlTowerLandingZone": {
            "Type": "AWS::ControlTower::LandingZone",
            "Properties": {
              "Manifest": {
                "governedRegions": ["us-east-1"],
                "organizationStructure": {
                  "security": { "name": "security" }
                },
                "centralizedLogging": {
                  "accountId": { "Ref": "LogArchiveAccountId" },
                  "configurations": {
                    "loggingBucket": { "retentionDays": 60 },
                    "accessLoggingBucket": { "retentionDays": 60 }
                  },
                  "enabled": true
                },
                "securityRoles": {
                  "accountId": { "Ref": "AuditAccountId" }
                },
                "accessManagement": { "enabled": true }
              },
              "Tags": [
                { "Key": "Name", "Value": "ControlTowerLandingZone" }
              ],
              "Version": "3.3"
            }
          }
        },
        "Outputs": {
          "LandingZoneId": {
            "Description": "The ID of the Control Tower Landing Zone",
            "Value": { "Ref": "ControlTowerLandingZone" }
          }
        }
      }

Steps to Reproduce

  1. Create a CloudFormation Stack resource with the manifest mentioned above.
  2. Wait for it to be provisioned on AWS (around 30 minutes)
  3. Check the resource on crossplane and see that is not synced or ready and keeps trying to recreate.

What happened?

If it is a simple cloudformation template (for instance, creating a parameter store), it works fine and the resource gets Ready and Synced. But if the cloudformation template is complex and takes too long to complete (like enabling Control Tower on a master account) it never gets synced or Ready, and it keeps trying to recreate the stack, even thou it was created successfully on AWS.

Relevant Error Output Snippet

Warning CannotCreateExternalResource 2m57s (x215 over 3h33m) managed/cloudformation.aws.upbound.io/v1betal, kind-stack (combined from similar events): async create failed: failed to create the resource: [{0 creating CloudFormation Stack (FoundationControlTowerStack): operation error CloudFormation: CreateStack, https response error StatusCode: 400, RequestID: ec977b01-3963-49da-8069-1f6f134b055a, AlreadyExistsException: Stack [FoundationControlTowerStack] already exists []}]

Crossplane Version

1.17.0

Provider Version

1.14.0

Kubernetes Version

No response

Kubernetes Distribution

EKS

Additional Info

I have tried to create an Observe-only resource to import the Cloudformation stack that was created and it works, but only if I set the external-name metadata do the ID (last part of the ARN of the stack), not the full ARN or Name. I'm not sure if is related to the bug.

karloscarrijozup commented 1 month ago

Just to add to this, I noticed that after exactly 15 minutes I get the "token expired" error on the provider logs, and after that it tries to recreate the stack, generating the AlreadyExistsException over and over again.

2024-10-02T18:10:11Z    DEBUG   provider-aws    Cannot create external resource {"controller": "managed/cloudformation.aws.upbound.io/v1beta1, kind=stack", "request": {"name":"ct-stack"}, "uid": "dd85052a-1483-438b-8d93-6e33aa384315", "version": "19203025", "external-name": "", "error": "async create failed: failed to create the resource: [{0 waiting for CloudFormation Stack (arn:aws:cloudformation:sa-east-1:xxxxxxxxxxxx:stack/controltower-stack/23403010-80e5-11ef-b07e-0a6a2c5bb2b9) create: operation error CloudFormation: DescribeStacks, https response error StatusCode: 403, RequestID: aa00246f-187c-4c34-b175-695ea3a91163, api error ExpiredToken: The security token included in the request is expired  []}]"}
2024-10-02T18:10:11Z    DEBUG   provider-aws    Async create starting...        {"trackerUID": "dd85052a-1483-438b-8d93-6e33aa384315", "resourceName": "ct-stack", "gvk": "cloudformation.aws.upbound.io/v1beta1, Kind=Stack", "tfID": ""}
2024-10-02T18:10:11Z    DEBUG   provider-aws    Creating the external resource  {"uid": "dd85052a-1483-438b-8d93-6e33aa384315", "name": "ct-stack", "gvk": "cloudformation.aws.upbound.io/v1beta1, Kind=Stack"}
2024-10-02T18:10:11Z    DEBUG   provider-aws    Async create ended.     {"trackerUID": "dd85052a-1483-438b-8d93-6e33aa384315", "resourceName": "ct-stack", "gvk": "cloudformation.aws.upbound.io/v1beta1, Kind=Stack", "error": "async create failed: failed to create the resource: [{0 creating CloudFormation Stack (controltower-stack): operation error CloudFormation: CreateStack, https response error StatusCode: 400, RequestID: 90d2e4b5-94f9-4ffe-85e8-6e764c42604e, AlreadyExistsException: Stack [controltower-stack] already exists  []}]", "tfID": ""}
karloscarrijozup commented 1 month ago

Seems related to #1346, #1482 and https://github.com/crossplane/crossplane/issues/5918