aws / aws-cdk

The AWS Cloud Development Kit is a framework for defining cloud infrastructure in code
https://aws.amazon.com/cdk
Apache License 2.0
11.65k stars 3.91k forks source link

docdb: I am getting cannot change VPC security group while doing a mojor version upgrade while changing engine version from 4.0.0 to 5.0.0 #29429

Closed actuallyrc closed 1 month ago

actuallyrc commented 7 months ago

Describe the bug

I have tried changing my docdb engine version from 4.0.0 to 5.0.0

documentdb/Cluster (documentdbCluster1518938B) Cannot change VPC security group while doing a major version upgrade. (Service: AmazonRDS; Status Code: 400; Error Code: InvalidParameterCombination; Request ID: db0e1863-3a99-4499-90a3-d9b8565e3d0d; Proxy: null)

❌ stack-db failed: Error: The stack named stack-doc-db failed to deploy: UPDATE_ROLLBACK_COMPLETE: Cannot change VPC security group while doing a major version upgrade. (Service: AmazonRDS; Status Code: 400; Error Code: InvalidParameterCombination; Request ID: db0e1863-3a99-4499-90a3-d9b8565e3d0d; Proxy: null) at FullCloudFormationDeployment.monitorDeployment (C:\ProgramData\nvs\node_modules\aws-cdk\lib\index.js:421:10708) at FullCloudFormationDeployment.monitorDeployment (C:\ProgramData\nvs\node_modules\aws-cdk\lib\index.js:421:10708) at process.processTicksAndRejections (node:internal/process/task_queues:95:5) at async Object.deployStack2 [as deployStack] (C:\ProgramData\nvs\node_modules\aws-cdk\lib\index.js:424:180229) at async C:\ProgramData\nvs\node_modules\aws-cdk\lib\index.js:424:163477

❌ Deployment failed: Error: The stack named stack-doc-db failed to deploy: UPDATE_ROLLBACK_COMPLETE: Cannot change VPC security group while doing a major version upgrade. (Service: AmazonRDS; Status Code: 00; Error Code: InvalidParameterCombination; Request ID: db0e1863-3a99-4499-90a3-d9b8565e3d0d; Proxy: null) 400; Error Code: InvalidParameterCombination; Request ID: db0e1863-3a99-4499-90a3-d9b8565e3d0d; Proxy: null) at FullCloudFormationDeployment.monitorDeployment (C:\ProgramData\nvs\node_modules\aws-cdk\lib\index.js:421:10708) at process.processTicksAndRejections (node:internal/process/task_queues:95:5) at async Object.deployStack2 [as deployStack] (C:\ProgramData\nvs\node_modules\aws-cdk\lib\index.js:424:180229) at async C:\ProgramData\nvs\node_modules\aws-cdk\lib\index.js:424:163477

The stack named stack-doc-db failed to deploy: UPDATE_ROLLBACK_COMPLETE: Cannot change VPC security group while doing a major version upgrade. (Service: AmazonRDS; Status Code: 400; Error Code: InvalidParameterCombination; Request ID: db0e1863-3a99-4499-90a3-d9b8565e3d0d; Proxy: null)

Expected Behavior

I have not done any changes in VPC security group but CDK is adding changes itself while cdk synth secgdocdbDB27CD00: Type: AWS::EC2::SecurityGroup Properties: GroupDescription: Allow tcp access using specified port to DocumentDB instance GroupName: secg-docdb SecurityGroupEgress:

Current Behavior

It should allow changing docdb version upgrade using CDK

Reproduction Steps

const parameterGroup = new docdb.ClusterParameterGroup(this, 'parameterGroup', {
      family: 'docdb4.0',
      parameters: {
        audit_logs: 'disabled',
        tls: 'disabled',
        ttl_monitor: 'disabled',
      },
      dbClusterParameterGroupName: `docdb-parameter-group`,
      description: `This is DocDB parameter group docdb-parameter-group`,
    });

     const existingVpc = ec2.Vpc.fromLookup(this, "VPC", {
      vpcId: props.vpcId,
    });
     const sgDocDB = new ec2.SecurityGroup(this, `secg-docdb`, {
        vpc: existingVpc,
        securityGroupName: `secg-docdb`,
        description: "Allow tcp access using specified port to DocumentDB instance",
        allowAllOutbound: true
      });

    const clusterIdentifier = `docdb-cluster`;

    const cluster = new docdb.DatabaseCluster(this, 'Cluster', {
      dbClusterName: clusterIdentifier,
      // instanceIdentifierBase: stage,
      masterUser: {
        username: props.masterUser,
        excludeCharacters: '"\'@/:\\',
        secretName: `secret-documentdb`,
      },
      port: props.port,
      storageEncrypted: true,
      kmsKey: props.kmsKey,
      instanceType: props.instanceType,
      vpcSubnets: {
        subnetType: ec2.SubnetType.PRIVATE_ISOLATED
      },
      vpc: props.vpc,
      instances: props.instances,
      securityGroup: sgDocDB,
      deletionProtection: true,
      removalPolicy: cdk.RemovalPolicy.RETAIN,
      parameterGroup: parameterGroup,
      engineVersion: '4.0.0',

      backup: {
        retention: cdk.Duration.days(30),
        preferredWindow: '01:00-02:00',
      }
    });
When you change parameterGroup  family to 'docdb5.0' and DatabaseCluster engineVersion to '5.0.0'.

It throws the error and mentioned above.

Possible Solution

No response

Additional Information/Context

No response

CDK CLI Version

2.114.1 (build 02bbb1d)

Framework Version

No response

Node.js Version

v18.10.0

OS

Windows 10 Enterprise

Language

TypeScript

Language Version

Typescript 5.3.3

Other information

No response

pahud commented 7 months ago

When you change parameterGroup family to 'docdb5.0' and DatabaseCluster engineVersion to '5.0.0'.

It should not change the subnets.

Can you run cdk diff and see what it was trying to update?

actuallyrc commented 7 months ago

@pahud Kindly find the cdk diff output below:

Hold on while we create a read-only change set to get a diff with accurate replacement information (use --no-change-set to use a less accurate but faster template-only diff) Resources AWS::DocDB::DBCluster documentdb/Cluster documentdbCluster1518938B may be replaced ├─ DBClusterParameterGroupName (may cause replacement) │ └─ .Ref: │ ├─ [-] documentdbparameterGroup69FBE572 │ └─ [+] documentdbparametergroupD66817FB └─ EngineVersion (may cause replacement) ├─ [-] 4.0.0 └─ [+] 5.0.0 AWS::DocDB::DBInstance documentdb/Cluster/Instance1 documentdbClusterInstance1DABD65A5 may be replaced └─ DBClusterIdentifier (may cause replacement) └─ .Ref: ├─ [-] documentdbCluster1518938B └─ [+] documentdbCluster1518938B (replaced)

pahud commented 7 months ago

This is weird. I didn't see any security group would be changed in your cdk diff. Looking into it.

pahud commented 7 months ago

OK I can reproduce this.

My code:

export class DummyStack extends Stack {
  constructor(scope: Construct, id: string, props: StackProps) {
    super(scope, id, props);

    const parameterGroup4 = new docdb.ClusterParameterGroup(this, 'parameterGroup4', {
      family: 'docdb4.0',
      parameters: {
        audit_logs: 'disabled',
        tls: 'disabled',
        ttl_monitor: 'disabled',
      },
      dbClusterParameterGroupName: `docdb-parameter-group4`,
      description: `This is DocDB parameter group docdb-parameter-group`,
    });

    const parameterGroup5 = new docdb.ClusterParameterGroup(this, 'parameterGroup5', {
      family: 'docdb5.0',
      parameters: {
        audit_logs: 'disabled',
        tls: 'disabled',
        ttl_monitor: 'disabled',
      },
      dbClusterParameterGroupName: `docdb-parameter-group5`,
      description: `This is DocDB parameter group docdb-parameter-group`,
    });

     const existingVpc = getDefaultVpc(this);

     const sgDocDB = new ec2.SecurityGroup(this, `secg-docdb`, {
        vpc: existingVpc,
        securityGroupName: `secg-docdb`,
        description: "Allow tcp access using specified port to DocumentDB instance",
        allowAllOutbound: true
      });

    // const clusterIdentifier = `docdb-cluster`;

    const cluster = new docdb.DatabaseCluster(this, 'Cluster', {
      // dbClusterName: clusterIdentifier,
      // instanceIdentifierBase: stage,
      masterUser: {
        username: 'foo',
        excludeCharacters: '"\'@/:\\',
        secretName: `secret-documentdb`,
      },
      // port: props.port,
      storageEncrypted: true,
      // kmsKey: props.kmsKey,
      instanceType: ec2.InstanceType.of(ec2.InstanceClass.T4G, ec2.InstanceSize.MEDIUM),
      vpcSubnets: {
        subnetType: ec2.SubnetType.PRIVATE_ISOLATED
      },
      vpc: existingVpc,
      // instances: props.instances,
      securityGroup: sgDocDB,
      deletionProtection: true,
      removalPolicy: RemovalPolicy.RETAIN,
      parameterGroup: parameterGroup4,
      engineVersion: '4.0.0',

      backup: {
        retention: Duration.days(30),
        preferredWindow: '01:00-02:00',
      }
    });
  }
}

And I just updated here from

      parameterGroup: parameterGroup4,
      engineVersion: '4.0.0',

to

      parameterGroup: parameterGroup5,
      engineVersion: '5.0.0',

My cdk diff

image

Error message

1:33:36 PM | UPDATE_FAILED | AWS::DocDB::DBCluster | ClusterEB0386A7 Cannot change VPC security group while doing a major version upgrade. (Service: AmazonRDS; Status Code: 400; Error Code: InvalidParameterCombination; Request ID: 841856 7a-bd14-40a1-b7ef-3455711c9758; Proxy: null)

pahud commented 7 months ago

cdk synth

Resources:
  parameterGroup40D79E4AA:
    Type: AWS::DocDB::DBClusterParameterGroup
    Properties:
      Description: This is DocDB parameter group docdb-parameter-group
      Family: docdb4.0
      Name: docdb-parameter-group4
      Parameters:
        audit_logs: disabled
        tls: disabled
        ttl_monitor: disabled
    Metadata:
      aws:cdk:path: dummy-stack3/parameterGroup4/Resource
  parameterGroup5BDFC0C0C:
    Type: AWS::DocDB::DBClusterParameterGroup
    Properties:
      Description: This is DocDB parameter group docdb-parameter-group
      Family: docdb5.0
      Name: docdb-parameter-group5
      Parameters:
        audit_logs: disabled
        tls: disabled
        ttl_monitor: disabled
    Metadata:
      aws:cdk:path: dummy-stack3/parameterGroup5/Resource
  secgdocdb6E5B07B7:
    Type: AWS::EC2::SecurityGroup
    Properties:
      GroupDescription: Allow tcp access using specified port to DocumentDB instance
      GroupName: secg-docdb
      SecurityGroupEgress:
        - CidrIp: 0.0.0.0/0
          Description: Allow all outbound traffic by default
          IpProtocol: "-1"
      VpcId: vpc-1f5b7e78
    Metadata:
      aws:cdk:path: dummy-stack3/secg-docdb/Resource
  ClusterSubnetsDCFA5CB7:
    Type: AWS::DocDB::DBSubnetGroup
    Properties:
      DBSubnetGroupDescription: Subnets for Cluster database
      SubnetIds:
        - subnet-0e28c622da3161550
        - subnet-033da6daff078d9f5
        - subnet-001e9157a300900e5
    Metadata:
      aws:cdk:path: dummy-stack3/Cluster/Subnets
  ClusterSecret6368BD0F:
    Type: AWS::SecretsManager::Secret
    Properties:
      Description:
        Fn::Join:
          - ""
          - - "Generated by the CDK for stack: "
            - Ref: AWS::StackName
      GenerateSecretString:
        ExcludeCharacters: "\"'@/:\\"
        GenerateStringKey: password
        PasswordLength: 41
        SecretStringTemplate: '{"username":"foo"}'
      Name: secret-documentdb
    UpdateReplacePolicy: Delete
    DeletionPolicy: Delete
    Metadata:
      aws:cdk:path: dummy-stack3/Cluster/Secret/Resource
  ClusterSecretAttachment769E6258:
    Type: AWS::SecretsManager::SecretTargetAttachment
    Properties:
      SecretId:
        Ref: ClusterSecret6368BD0F
      TargetId:
        Ref: ClusterEB0386A7
      TargetType: AWS::DocDB::DBCluster
    Metadata:
      aws:cdk:path: dummy-stack3/Cluster/Secret/Attachment/Resource
  ClusterEB0386A7:
    Type: AWS::DocDB::DBCluster
    Properties:
      BackupRetentionPeriod: 30
      DBClusterParameterGroupName:
        Ref: parameterGroup5BDFC0C0C
      DBSubnetGroupName:
        Ref: ClusterSubnetsDCFA5CB7
      DeletionProtection: true
      EngineVersion: 5.0.0
      MasterUserPassword:
        Fn::Join:
          - ""
          - - "{{resolve:secretsmanager:"
            - Ref: ClusterSecret6368BD0F
            - :SecretString:password::}}
      MasterUsername:
        Fn::Join:
          - ""
          - - "{{resolve:secretsmanager:"
            - Ref: ClusterSecret6368BD0F
            - :SecretString:username::}}
      PreferredBackupWindow: 01:00-02:00
      StorageEncrypted: true
      VpcSecurityGroupIds:
        - Fn::GetAtt:
            - secgdocdb6E5B07B7
            - GroupId
    UpdateReplacePolicy: Retain
    DeletionPolicy: Retain
    Metadata:
      aws:cdk:path: dummy-stack3/Cluster/Resource
  ClusterInstance1448F06E4:
    Type: AWS::DocDB::DBInstance
    Properties:
      DBClusterIdentifier:
        Ref: ClusterEB0386A7
      DBInstanceClass: db.t4g.medium
    UpdateReplacePolicy: Retain
    DeletionPolicy: Retain
    Metadata:
      aws:cdk:path: dummy-stack3/Cluster/Instance1
  CDKMetadata:
    Type: AWS::CDK::Metadata
    Properties:
      Analytics: v2:deflate64:H4sIAAAAAAAA/22QTwvCMAzFP4v3rroJ4tU/IN7EeZesi9rpOklSRaTf3epUmHhK8t6P1zSZToeZHvTgyokpj8nJFvqeC5ijitL2XjamjMrs5FmQVkBQY2wW1Pizmu3cfPrfmoNAAYxvt0VzXziULpCjIZROVDssHQs4g0GhyeJOaDxZuX1f7ghB8SuHa3CwR4obv4hndFs2QHuUicSvHWp0or7ArxOCWiM3ngw+oU8flGtK1BX3L+lYp6N4tIqtTcg7sTXqdVsfjsJ9aFEBAAA=
    Metadata:
      aws:cdk:path: dummy-stack3/CDKMetadata/Default
Parameters:
  BootstrapVersion:
    Type: AWS::SSM::Parameter::Value<String>
    Default: /cdk-bootstrap/hnb659fds/version
    Description: Version of the CDK Bootstrap resources in this environment, automatically retrieved from SSM Parameter Store. [cdk:skip]
Rules:
  CheckBootstrapVersion:
    Assertions:
      - Assert:
          Fn::Not:
            - Fn::Contains:
                - - "1"
                  - "2"
                  - "3"
                  - "4"
                  - "5"
                - Ref: BootstrapVersion
        AssertDescription: CDK bootstrap stack version 6 required. Please run 'cdk bootstrap' with a recent version of the CDK CLI.
pahud commented 7 months ago

This seems to be a CFN issue rather than CDK. I will report to relevant team internally.

pahud commented 7 months ago

internal tracking: V1294325844

aprat84 commented 7 months ago

Is there any way to track progress? Or any workaround?

pahud commented 5 months ago

still pending for service team update. I've just escalated again.

pahud commented 4 months ago

internal tracking: V1195768568

pahud commented 1 month ago

This seems to be a bug when upgrading from 4.0.0 with 5.0.0 when the VpcSecurityGroupIds field references a SecurityGroup resource in the corresponding CloudFormation template. A temporary fix is to remove VpcSecurityGroupIds from the template for in-place major version upgrade.

In CDK you can use addPropertyDeletionOverride():

// find the L1 resource of the cluster
const cfnCluster = cluster.node.findChild('Resource') as docdb.CfnDBCluster;
// remove VpcSecurityGroupIds as a workaround
cfnCluster.addPropertyDeletionOverride('VpcSecurityGroupIds');

Please test this in a testing environment and check if this works for you.

Thanks,

Pahud

Internal: V1208058460

github-actions[bot] commented 1 month ago

This issue has not received a response in a while. If you want to keep this issue open, please leave a comment below and auto-close will be canceled.

aprat84 commented 1 month ago

@pahud it works! But, there's an error when cleaning up: it tries to delete the new parameter group for the v5, instead of the old one. And the old one remains there as junk...

11:10:19 AM | DELETE_FAILED        | AWS::DocDB::DBClusterParameterGroup         | DatabaseParameterGroupXYZ
Got InvalidDBParameterGroupStateException with error: One or more database instances are still members of this parameter group databaseparametergroupc1234-xyz, so the group cannot be deleted

Anyway, is the CloudFormation team going to fix it?

actuallyrc commented 1 month ago

I have solved this by adding below script temporarily

(cluster.node.defaultChild as docdb.CfnDBCluster).addDeletionOverride('Properties.VpcSecurityGroupIds');