aws / aws-cdk

The AWS Cloud Development Kit is a framework for defining cloud infrastructure in code
https://aws.amazon.com/cdk
Apache License 2.0
11.55k stars 3.87k forks source link

DocDB : cdk upgrade shows VpvSecurityIds change and requires cluster to be replaced #31483

Open qqsandas opened 1 week ago

qqsandas commented 1 week ago

Describe the bug

While from upgrading from cdk 1.156 to cdk 2.158.0

in a stack with CfnDBCluster, cdk diff shows

[~] AWS::DocDB::DBCluster <cluster> may be replaced
  |_ [~] VpcSecurityGroupIds (may cause replacement)

We performed a similar upgrade a while ago upgrading from cdk 1.156 to 2.51.1, that upgrade also showed a difference in VpcSecurityGroupIds, but did not require cluster replacement

Regression Issue

Last Known Working CDK Version

2.51.1

Expected Behavior

Change in security group should not require entire DocDB cluster to be replaced

Current Behavior

While from upgrading from cdk 1.156 to cdk 2.158.0

in a stack with CfnDBCluster, cdk diff shows

[~] AWS::DocDB::DBCluster <cluster> may be replaced
  |_ [~] VpcSecurityGroupIds (may cause replacement)

Reproduction Steps

Upgrade cdk from 1.156 to 2.158 with a stack containing CfnDBCluster

Possible Solution

No response

Additional Information/Context

No response

CDK CLI Version

2.158

Framework Version

No response

Node.js Version

Python cdk

OS

Linux

Language

Python

Language Version

No response

Other information

No response

pahud commented 1 week ago

checking

pahud commented 1 week ago

Please allow me to confirm:

  1. You are upgrading from cdk from 1.156 to 2.158 with a stack containing CfnDBCluster and you didn't change any cdk code.
  2. on cdk diff you see the VpcSecurityGroupIds was changed, triggering potential DBCluster replacement.

And please share with us:

  1. Your minimal CDK code that provisions the DBCluster.
  2. please run cdk synth in different versions and share the AWS::DocDB::DBCluster resource definition of the two versions with us.
  3. did you make any major upgrade like 4.x to 5.x?

Related to internal tracking: V1195768568

qqsandas commented 1 week ago

Answer to question 1: We did not upgrade the engine or anything with DocumentDB

But the cdk upgrade required us to make one modification

in 1.156, when setting CfnDBCluster.vpc_security_group_ids which accepts a list of str

In 1.156 we passed the values from SecurityGroup.secrity_group_name, with the cdk 2 upgrade, it required the value from SecurityGroup.security_group_id instead

Answer to question 2: yes

pahud commented 1 week ago

It's still unclear to me what exactly is the code you used to create the cluster in CDK v1 and CDK v2.

Are you able to share the code you create the cluster both in v1 and v2 so we could could better support you?

Looks like what you have change actually is the vpc_security_group_ids property and that explains

[~] AWS::DocDB::DBCluster <cluster> may be replaced
  |_ [~] VpcSecurityGroupIds (may cause replacement)

I think this would not be a regression as you actually have modified this prop. If you run cdk synth and check the synthesized template specifically the VpcSecurityGroupIds prop, the value should have changed.

qqsandas commented 1 week ago

adding the remaining details requested...

yes there was a difference in the VpcSecurityGroupIds section


from the cdk.out folder for cdk 1.156 vs. cdk 2.158 for AWS::DocDB::DBCluster(some details masked):

( I compared the actual contents of the DocumentdbProddocdbsg9A578F51 resource being referred to, and they were identical in the 2 cdk versions )

cdk 1.156

  {
    "Type": "AWS::DocDB::DBCluster",
    "Properties": {
      "BackupRetentionPeriod": 7,
      "DBClusterIdentifier": "<DBClusterIdentifier>",
      "DBClusterParameterGroupName": "<DBClusterParameterGroupName>",
      "DBSubnetGroupName": "<DBSubnetGroupName>",
      "DeletionProtection": true,
      "EnableCloudwatchLogsExports": [
        "audit"
      ],
      "EngineVersion": "4.0",
      "MasterUsername": "<MasterUsername>",
      "MasterUserPassword": "<MasterUserPassword>",
      "Port": 27017,
      "PreferredBackupWindow": "10:00-10:30",
      "PreferredMaintenanceWindow": "sat:09:30-sat:10:00",
      "StorageEncrypted": true,
      "VpcSecurityGroupIds": [
        {
          "Ref": "DocumentdbProddocdbsg9A578F51"
        }
      ]
    }
  }

cdk 2.158

  {
    "Type": "AWS::DocDB::DBCluster",
    "Properties": {
      "BackupRetentionPeriod": 7,
      "DBClusterIdentifier": "<DBClusterIdentifier>",
      "DBClusterParameterGroupName": "<DBClusterParameterGroupName>",
      "DBSubnetGroupName": "<DBSubnetGroupName>",
      "DeletionProtection": true,
      "EnableCloudwatchLogsExports": [
        "audit"
      ],
      "EngineVersion": "4.0",
      "MasterUserPassword": "<MasterUsername>",
      "MasterUsername": "<MasterUserPassword>",
      "Port": 27017,
      "PreferredBackupWindow": "10:00-10:30",
      "PreferredMaintenanceWindow": "sat:09:30-sat:10:00",
      "StorageEncrypted": true,
      "VpcSecurityGroupIds": [
        {
          "Fn::GetAtt": [
            "DocumentdbProddocdbsg9A578F51",
            "GroupId"
          ]
        }
      ]
    }

the python cdk code used to create the cluster is: (showing only the security group relevant section)

cdk 1.156


sg = ec2.SecurityGroup( self, '<stack_id>', vpc=... )
sg.add_ingress_rule( . . .)

CfnDBCluster(
  . . .
  vpc_security_group_ids=[sg.security_group_name],
  . . .
)

cdk 2.158.0


sg = ec2.SecurityGroup( self, '<stack_id>', vpc=... )
sg.add_ingress_rule( . . .)

CfnDBCluster(
  . . .
  vpc_security_group_ids=[sg.security_group_id],
  . . .
)

also noting that we made the same change to set the CfnDBCluster.vpc_security_group_ids with SecurityGroup.security_group_id instead of SecurityGroup.security_group_name in a separate cdk upgrade for a different DocumentDB cluster to upgrade from cdk 1.156 => cdk 2.51.1, the cdk diff for that upgrade also showed a diff onVpcSecurityGroupIds, but did not require cluster replacement, the output from that cdk diff was:

[+] Parameter BootstrapVersion BootstrapVersion: {"Type":"AWS::SSM::Parameter::Value<String>","Default":"/cdk-bootstrap/hnb659fds/version","Description":"Version of the CDK Bootstrap resources in this environment, automatically retrieved from SSM Parameter Store. [cdk:skip]"}
Resources
[~] AWS::DocDB::DBCluster <cluster>
 └─ [~] VpcSecurityGroupIds
     └─ @@ -1,5 +1,8 @@
        [ ] [
        [ ]   {
        [-]     "Ref": "DocumentdbProddocdbsg9A578F51"
        [+]     "Fn::GetAtt": [
        [+]       "DocumentdbProddocdbsg9A578F51",
        [+]       "GroupId"
        [+]     ]
        [ ]   }
        [ ] ]

whereas the output from the cdk diff for 1.156 => 2.158.0 is:

[+] Parameter BootstrapVersion BootstrapVersion: {"Type":"AWS::SSM::Parameter::Value<String>","Default":"/cdk-bootstrap/hnb659fds/version","Description":"Version of the CDK Bootstrap resources in this environment, automatically retrieved from SSM Parameter Store. [cdk:skip]"}
Resources
[~] AWS::DocDB::DBCluster <cluster> may be replaced
 └─ [~] VpcSecurityGroupIds (may cause replacement)
     └─ @@ -1,5 +1,8 @@
        [ ] [
        [ ]   {
        [-]     "Ref": "DocumentdbProddocdbsg9A578F51"
        [+]     "Fn::GetAtt": [
        [+]       "DocumentdbProddocdbsg9A578F51",
        [+]       "GroupId"
        [+]     ]
        [ ]   }
        [ ] ]
pahud commented 1 week ago

@qqsandas Thank you for your detailed report. This is very helpful.

Looks like the diff is literally the same but cloudformation considers it might cause replacement, which is not determined by CDK.

I am guessing this might be related to https://github.com/aws/aws-cdk/issues/29429#issuecomment-2359491234 that would happen when upgrading the major version of docdb but obviously you are not.

Are you able to try the workaround as mentioned in https://github.com/aws/aws-cdk/issues/29429#issuecomment-2359491234 to remove the VpcSecurityGroupIds prop and see if you are still seeing the replacement warning.

Also, please make sure to test that in a testing environment and make all the necessary backups.

qqsandas commented 1 week ago

We don't have another test cluster to try this right now.

We are thinking now a 2 step upgrade from 1.156 => 2.51.1 first, since that does not show cluster replacement warning, and then from 2.51.1 to a more current version of cdk later

Thank you for looking into this, we will check back on this thread for updates