aws / aws-cdk

The AWS Cloud Development Kit is a framework for defining cloud infrastructure in code
https://aws.amazon.com/cdk
Apache License 2.0
11.51k stars 3.85k forks source link

RDS Proxy created by DBCluster.addProxy() should have a depends on DBCluster in generated CloudFormation template #14258

Open JerryBLi opened 3 years ago

JerryBLi commented 3 years ago

Deploying a stack with CloudFormation that has a RDS Proxy added with the addProxy method from the DatabaseCluster could possibly fail because the Proxy will attempt to be created before the DatabaseCluster. I had a stack with a Proxy that was able to deploy on one AWS account using the template but will consistently fail when trying to deploy to another AWS account.

Reproduction Steps

Click to expand CDK Stack This is a pared-down version of the stack I'm trying to deploy: ``` import { Construct, Stack, StackProps } from 'monocdk'; import { AuroraPostgresEngineVersion, Credentials, DatabaseCluster, DatabaseClusterEngine, DatabaseSecret, } from 'monocdk/aws-rds'; import { SubnetType, Vpc, SecurityGroup, Port, } from 'monocdk/aws-ec2'; export class MyTestStack extends Stack { constructor(scope: Construct, id: string, props: StackProps) { super(scope, id, props); /* * Resources for VPC setup including security groups and bastion hosts */ const rdsVpc = new Vpc(this, 'myVpc'); // Security group assumed by AWS resources to allow access to RDS const canReadRdsSecGroup = new SecurityGroup( this, 'CanReadRdsSecGroup', { vpc: rdsVpc, }, ); // Security group assumed by RDS, allow connections on default port for other sec groups const rdsSecurityGroup = new SecurityGroup(this, 'RdsSecGroup', { vpc: rdsVpc, }); rdsSecurityGroup.connections.allowFrom( canReadRdsSecGroup, Port.tcp(5432), 'Security Group allowing RDS access from AWS resources', ); /* * Resources related to setting up the RDS instance */ const databaseName = 'db_name'; const username = 'postgres'; const rdsAdminSecret = new DatabaseSecret( this, 'AdminSecret', { username, }, ); // Create the Database cluster const rdsCluster = new DatabaseCluster(this, 'myDatabase', { engine: DatabaseClusterEngine.auroraPostgres({ version: AuroraPostgresEngineVersion.VER_11_8, }), credentials: Credentials.fromSecret(rdsAdminSecret), instanceProps: { vpc: rdsVpc, vpcSubnets: { subnetType: SubnetType.PRIVATE, }, securityGroups: [rdsSecurityGroup], }, storageEncrypted: true, clusterIdentifier: 'myDatabase', defaultDatabaseName: databaseName, }); // Create the database secret for non-admin account const user2Secret = new DatabaseSecret(this, 'User2Secret', { username: 'user2', masterSecret: rdsAdminSecret, }); user2Secret.attach(rdsCluster); // Add rotations for the secrets rdsCluster.addRotationSingleUser(); // Create the Database Proxy const rdsProxy = rdsCluster.addProxy('RdsProxy', { secrets: [rdsAdminSecret,user2Secret], vpc: rdsVpc, securityGroups: [rdsSecurityGroup], iamAuth: true, }); rdsCluster.connections.allowDefaultPortFrom( rdsProxy, 'Allow connections to the database cluster from the Proxy', ); } } ```

What did you expect to happen?

Click to expand Partial CFN template ``` "myDatabaseRdsProxy3FC52F28": { "Type": "AWS::RDS::DBProxy", "Properties": { "Auth": [ { "AuthScheme": "SECRETS", "IAMAuth": "REQUIRED", "SecretArn": { "Ref": "AdminSecretB9452750" } }, { "AuthScheme": "SECRETS", "IAMAuth": "REQUIRED", "SecretArn": { "Ref": "DataUploadDbSecret57F9A554" } } ], "DBProxyName": "RdsProxy", "EngineFamily": "POSTGRESQL", "RoleArn": { "Fn::GetAtt": [ "myDatabaseRdsProxyIAMRole8ADFED42", "Arn" ] }, "VpcSubnetIds": [ { "Ref": "myVpcPrivateSubnet1SubnetDE1978C0" }, { "Ref": "myVpcPrivateSubnet2SubnetB7D01881" } ], "RequireTLS": true, "VpcSecurityGroupIds": [ { "Fn::GetAtt": [ "RdsSecGroup72BC67FD", "GroupId" ] } ] }, "Metadata": { "aws:cdk:path": "Infra-test/myDatabase/RdsProxy/Resource" }, "DependsOn": [ "myDatabase6024D442" ] }, ```

What actually happened?

Click to expand Partial CFN Template & Error ``` "myDatabaseRdsProxy3FC52F28": { "Type": "AWS::RDS::DBProxy", "Properties": { "Auth": [ { "AuthScheme": "SECRETS", "IAMAuth": "REQUIRED", "SecretArn": { "Ref": "AdminSecretB9452750" } }, { "AuthScheme": "SECRETS", "IAMAuth": "REQUIRED", "SecretArn": { "Ref": "DataUploadDbSecret57F9A554" } } ], "DBProxyName": "RdsProxy", "EngineFamily": "POSTGRESQL", "RoleArn": { "Fn::GetAtt": [ "myDatabaseRdsProxyIAMRole8ADFED42", "Arn" ] }, "VpcSubnetIds": [ { "Ref": "myVpcPrivateSubnet1SubnetDE1978C0" }, { "Ref": "myVpcPrivateSubnet2SubnetB7D01881" } ], "RequireTLS": true, "VpcSecurityGroupIds": [ { "Fn::GetAtt": [ "RdsSecGroup72BC67FD", "GroupId" ] } ] }, "Metadata": { "aws:cdk:path": "Infra-test/myDatabase/RdsProxy/Resource" }, ``` **This CFN lacks the `DependsOn` values**. When trying to deploy this to my 2nd AWS account in US-East-1, Cloudformation fails to deploy the stack with an error: ``` RDS is not authorized to assume service-linked role arn:aws:iam::[AWS ACCOUNT ID]:role/aws-service-role/rds.amazonaws.com/AWSServiceRoleForRDS (Service: AWSSecurityTokenService; Status Code: 403; Error Code: AccessDenied; Request ID: [REQUEST ID]; Proxy: null). Check your RDS service-linked role and try again. ```

Environment

Other


This is :bug: Bug Report

skinny85 commented 3 years ago

Hey @JerryBLi ,

thanks for opening the issue.

Are you 100% certain that error is related to the lack of the DependsOn clause? It seems like a pretty generic error to me...

Thanks, Adam

JerryBLi commented 3 years ago

I'm pretty confident that it was the cause of the bug. I agree that the error is generic; it's relevant because the error happened on the deployment of the RDS Proxy. I cut a support ticket to AWS support and the engineer was able to diagnose the issue and replicate it using CFN. They found out that the RDS Proxy was being created before the DB Cluster and was able to fix it with a DependsOn parameter. What confuses me was why the deployment succeeded for one of my AWS accounts that was deployed to US-West-2 but failed for my other account deploying to US-East-1. This latest deployment (that failed) was on a new AWS account so maybe that was part of the issue.

I was able to get around this issue on my end by creating a new DatabaseProxy instance and adding a dependency to the DatabaseCluster. In that case, the DependsOn resources in the generated CFN did not have a circular dependency error.

skinny85 commented 3 years ago

Interesting. I literally just deployed a Proxy for a DatabaseCluster to diagnose a different issue in the current CDK version (so no DependsOn). Everything worked fine.

I'm not saying this is not a problem (it might have worked by accident without the DependsOn), but I'm wondering whether it's the DependsOn that's at fault here...

JerryBLi commented 3 years ago

I agree @skinny85, I'm not 100% sure that the lack of DependsOn is the cause but I do think it helped fix it. It was able to deploy fine for me (without dependsOn) the first time in my beta AWS account but I ran into the issue in my gamma AWS account.

itsinprog commented 2 years ago

It seems that this issue only pops up when the RDS cluster AND the RDS proxy are both new resources.

A solution I have found, while not ideal, is running the deploy twice. First without the proxy, then after cluster creation another adding the proxy.

github-actions[bot] commented 1 year ago

This issue has not received any attention in 1 year. If you want to keep this issue open, please leave a comment below and auto-close will be canceled.

kerling commented 1 year ago

This issue is affecting my team, both our shared beta & prod AWS accounts and whenever we onboard a new member to the service in question. I found some internal Amazon posts that's recommended the same as above (explicitly create a proxy and call addDependency) but, also as mentioned above in this thread, that creates other extraneous issues that are hard to resolve around circular dependencies. Can you please re-open this?

peterwoodworth commented 1 year ago

Thanks for letting us know @kerling,

We could add a line in the constructor to add a dependency on the CfnDbProxy as well instead of just the target group https://github.com/aws/aws-cdk/blob/06a0b1995fd024bcc48c80a68d6a0f371b00d64c/packages/aws-cdk-lib/aws-rds/lib/proxy.ts#L493

I'm marking this as good first issue, hopefully someone will be able to contribute this fix, we might not be able to get to it for a while.

kerling commented 1 year ago

Thanks, @peterwoodworth! If/when I get a chance I'll take a stab at the fix you recommended :)

kerling commented 1 year ago

I noticed that the IAM role referenced in the error was a service-linked role. I was able to work around this issue by manually calling:

aws iam create-service-linked-role --aws-service-name rds.amazonaws.com

before deployment, and that was simpler than adding a addDependency call or deploying twice.

peterwoodworth commented 1 year ago

Great to hear @kerling, glad you found a solution that works for you. Based on this, I think this could be a CloudFormation bug based on how the service role gets created?

kerling commented 1 year ago

It's possible, but my investigation didn't leave me as far as an answer to that question. I'm not sure how or when CloudFormation or CDK would typically create those service-linked roles as part of resource creation.

skinny85 commented 1 year ago

I might be wrong, but I think it's actually the service itself that is supposed to create that linked role (the first time a resource that requires it is created in the given region).

endertunc commented 1 year ago

I recently had the exact issue on a newly created AWS account while trying to create a RDS and RDS Proxy using CDK. Issue was related to missing service-linked-role and @kerling 's suggestion above resolved it.

According to this page https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/UsingWithRDS.IAM.ServiceLinkedRoles.html @skinny85 might be right.

Creating a service-linked role for Amazon Aurora

You don't need to manually create a service-linked role. When you create a DB cluster, Amazon Aurora creates the service-linked role for you. 

However, docs keep referring to "Aurora" but I would assume this would be the same for regular RDS instances.

BenCGI commented 11 months ago

Same here...I fixed it be deploying twice (once without proxy, afterwards with).

At least when I checked the service-linked-role was there (but could not test the command written above as I did not have the needed permission to do so on my local machine).

ebellavance commented 4 months ago

I had this problem with a new account. No RDS database had been created before so the service role did not exist. As it is RDS which creates the service role. By deploying twice or creating the service role on the command line before solve the problem for now