aws / aws-cdk

The AWS Cloud Development Kit is a framework for defining cloud infrastructure in code
https://aws.amazon.com/cdk
Apache License 2.0
11.66k stars 3.92k forks source link

(rds): isFromLegacyInstanceProps migration flag not working #27177

Open TimQuist opened 1 year ago

TimQuist commented 1 year ago

Describe the bug

When using the isFromLegacyInstanceProps to migrate my old RDS instance properties from instanceProps to the new method using reader and writer params, is still flagging my instances for recreation.

This would cause my instances to be recreated, and I am curious if this would bring downtime and possibly even get rid of my automated backups.

Expected Behavior

I would except the output of the cdk diff step against my 'test' AWS account to be:

Stack test-rds-stack (rds-stack)
There were no differences

Current Behavior

The current output of cdk diff against my 'test' AWS account using the migration property is:

Stack test-rds-stack (rds-stack)
Resources
[~] AWS::RDS::DBInstance rds-cluster/Instance1 rdsclusterInstance19743D359 replace
 └─ [-] DBInstanceIdentifier (requires replacement)
     └─ rds-clusterinstance1
[~] AWS::RDS::DBInstance rds-cluster/Instance2 rdsclusterInstance2CF41B1E4 replace
 └─ [-] DBInstanceIdentifier (requires replacement)
     └─ rds-clusterinstance2

Reproduction Steps

Old CDK code used to deploy RDS currently on AWS test environment:

 private buildRdsCluster(): DatabaseCluster {
        const rdsCluster = new DatabaseCluster(
            this,
            RdsUtils.RDS_CLUSTER,
            {
                clusterIdentifier: RdsUtils.RDS_CLUSTER,
                engine: DatabaseClusterEngine.auroraPostgres(
                    {
                        version: AuroraPostgresEngineVersion.VER_14_6,
                    }
                ),
                port: RdsUtils.RDS_PORT,
                credentials: Credentials.fromSecret(this.rdsPostgresSecret),
                storageEncrypted: true,
                instanceProps: {
                    instanceType: InstanceType.of(InstanceClass.T3, InstanceSize.MEDIUM),
                    vpcSubnets: {
                        subnetType: SubnetType.PRIVATE_ISOLATED,
                    },
                    vpc: this.vpc,
                    securityGroups: [
                        this.rdsSecurityGroup,
                    ],
                },
                backup: {
                    preferredWindow: '00:00-01:00',
                    retention: Duration.days(
                        this.props.isProduction ? 30 : 7
                    ),
                },
                preferredMaintenanceWindow: 'Wed:01:00-Wed:02:00',
                removalPolicy: RemovalPolicy.RETAIN,
            }
        );
        new CfnOutput(
            this,
            `${RdsUtils.RDS_CLUSTER_ENDPOINT}-output`,
            {
                exportName: RdsUtils.getRdsClusterEndpointExportName(),
                description: 'The RDS cluster endpoint',
                value: `${rdsCluster.clusterEndpoint.socketAddress}`,
            }
        );
        return rdsCluster;
    }

New CDK code used to do migration with no functional changes and the migration property:

private buildRdsCluster(): DatabaseCluster {
    const instanceProps: ProvisionedClusterInstanceProps = {
      instanceType: InstanceType.of(InstanceClass.T3, InstanceSize.MEDIUM),
      isFromLegacyInstanceProps: true,
    };

    const rdsCluster = new DatabaseCluster(this, RdsUtils.RDS_CLUSTER, {
      vpc: this.vpc,
      securityGroups: [this.rdsSecurityGroup],
      vpcSubnets: { subnetType: SubnetType.PRIVATE_ISOLATED },
      clusterIdentifier: RdsUtils.RDS_CLUSTER,
      engine: DatabaseClusterEngine.auroraPostgres({
        version: AuroraPostgresEngineVersion.VER_14_6,
      }),
      port: RdsUtils.RDS_PORT,
      credentials: Credentials.fromSecret(this.rdsPostgresSecret),
      storageEncrypted: true,
      writer: ClusterInstance.provisioned("Instance1", instanceProps),
      readers: [ClusterInstance.provisioned("Instance2", instanceProps)],
      backup: {
        preferredWindow: "00:00-01:00",
        retention: Duration.days(this.props.isProduction ? 30 : 7),
      },
      preferredMaintenanceWindow: "Wed:01:00-Wed:02:00",
      removalPolicy: RemovalPolicy.RETAIN,
    });
    new CfnOutput(this, `${RdsUtils.RDS_CLUSTER_ENDPOINT}-output`, {
      exportName: RdsUtils.getRdsClusterEndpointExportName(),
      description: "The RDS cluster endpoint",
      value: `${rdsCluster.clusterEndpoint.socketAddress}`,
    });
    return rdsCluster;
  }

Possible Solution

Using the method described in https://github.com/aws/aws-cdk/issues/25942 this problem can be solved. However in a non-ideal way.

You can directly specify an instanceIdentifier on both the reader and writer. However this creates a problem when deploying on multiple environments, the identifier of the reader and writer might differ on different environments due to a failover happening.

Doing this however, will cause cdk diff to report what we want:

Stack test-rds-stack (rds-stack)
There were no differences

Example:

const rdsCluster = new DatabaseCluster(this, RdsUtils.RDS_CLUSTER, {
      vpc: this.vpc,
      securityGroups: [this.rdsSecurityGroup],
      vpcSubnets: { subnetType: SubnetType.PRIVATE_ISOLATED },
      clusterIdentifier: RdsUtils.RDS_CLUSTER,
      engine: DatabaseClusterEngine.auroraPostgres({
        version: AuroraPostgresEngineVersion.VER_14_6,
      }),
      port: RdsUtils.RDS_PORT,
      credentials: Credentials.fromSecret(this.rdsPostgresSecret),
      storageEncrypted: true,
      writer: ClusterInstance.provisioned("Instance1", {
        ...instanceProps,
        instanceIdentifier: "rds-clusterinstance1",
      }),
      readers: [
        ClusterInstance.provisioned("Instance2", {
          ...instanceProps,
          instanceIdentifier: "rds-clusterinstance2",
        }),
      ],
      backup: {
        preferredWindow: "00:00-01:00",
        retention: Duration.days(this.props.isProduction ? 30 : 7),
      },
      preferredMaintenanceWindow: "Wed:01:00-Wed:02:00",
      removalPolicy: RemovalPolicy.RETAIN,
    });

Additional Information/Context

Also responded on https://github.com/aws/aws-cdk/discussions/25900

CDK CLI Version

2.93.0 (build 724bd01)

Framework Version

No response

Node.js Version

v18.16.0

OS

MacOs

Language

Typescript

Language Version

5.2.2

Other information

Also replied on: https://github.com/aws/aws-cdk/discussions/25900

rix0rrr commented 1 year ago

Specifying the instance identifier will be the solution.

However this creates a problem when deploying on multiple environments, the identifier of the reader and writer might differ on different environments due to a failover happening.

The only thing you need to be concerned with is CloudFormation's view of the world. CloudFormation will not know about any failover, so it will only apply any new changes with respect to CloudFormation's own previous configuration (emphatically, not the actual state of the world). In this case, since its previous template and its current template will be the same, it will not change anything.

What will happen when you do intend to make a change via CloudFormation after a failover has happened ... I actually do not know. If it turns out that managing RDS via CloudFormation is not reliable, that's something you should take up with the RDS and/or CloudFormation teams.

TimQuist commented 1 year ago

Specifying the instance identifier will be the solution.

However this creates a problem when deploying on multiple environments, the identifier of the reader and writer might differ on different environments due to a failover happening.

The only thing you need to be concerned with is CloudFormation's view of the world. CloudFormation will not know about any failover, so it will only apply any new changes with respect to CloudFormation's own previous configuration (emphatically, not the actual state of the world). In this case, since its previous template and its current template will be the same, it will not change anything.

What will happen when you do intend to make a change via CloudFormation after a failover has happened ... I actually do not know. If it turns out that managing RDS via CloudFormation is not reliable, that's something you should take up with the RDS and/or CloudFormation teams.

Thanks, can verify that this works. We have a PROD environment with this exact situation. The cdk diff indeed does not specify anything about replacement as long as you keep the instanceIdentifier the same as specified in the Cloudformation template.

Should there be any further specification of this in the docs so others don't face the same issue as me?

rittneje commented 10 months ago

@rix0rrr We just hit this bug as well. Essentially the isFromLegacyInstanceProps flag is useless as-is. We shouldn't have to manually specify the instance identifiers - that flag should force it to generate them the same way it used to.