data-dot-all / dataall

A modern data marketplace that makes collaboration among diverse users (like business, analysts and engineers) easier, increasing efficiency and agility in data projects on AWS.
https://data-dot-all.github.io/dataall/
Apache License 2.0
228 stars 82 forks source link

CDK deploy fails with parameter enable_update_dataall_stacks_in_cicd_pipeline set to true #662

Closed MaxRichter closed 1 year ago

MaxRichter commented 1 year ago

Describe the bug

When setting enable_update_dataall_stacks_in_cicd_pipeline parameter to true for the deployment environment(s), the cdk deploy dataall-main-cicd-stack is running into a CREATE_FAILED status and rolls back the CloudFormation stack.

How to Reproduce

*P.S. Please do not attach files as it's considered a security risk. Add code snippets directly in the message body as much as possible.*

cdk.json used

{ "app": "python ./deploy/app.py", "context": { "@aws-cdk/aws-apigateway:usagePlanKeyOrderInsensitiveId": false, "@aws-cdk/aws-cloudfront:defaultSecurityPolicyTLSv1.2_2021": false, "@aws-cdk/aws-rds:lowercaseDbIdentifier": false, "@aws-cdk/core:stackRelativeExports": false, "tooling_region": "eu-central-1", "resource_prefix": "da", "DeploymentEnvironments": [ { "envname": "dev", "account": "", "region": "eu-central-1", "enable_cw_rum": true, "enable_cw_canaries": true, "enable_pivot_role_auto_create": true, "enable_update_dataall_stacks_in_cicd_pipeline": true, "enable_opensearch_serverless": true }, { "envname": "prod", "account": "", "region": "eu-central-1", "enable_cw_rum": true, "enable_cw_canaries": true, "enable_pivot_role_auto_create": true, "enable_update_dataall_stacks_in_cicd_pipeline": true, "enable_opensearch_serverless": true } ] } }

Run: cdk deploy dataall-main-cicd-stack

Stack creation fails with the following error message:

1:31:07 PM | CREATE_FAILED | AWS::IAM::Policy | CodeBuildExpandedRolemain/DefaultPolicy Resource handler returned message: "Maximum policy size of 10240 bytes exceeded for role da-main-expanded-codebuil d-role (Service: Iam, Status Code: 409, Request ID: 78e4a682-95cb-4229-9f08-125be5564ac8)" (RequestToken: 3c404177 -6d2a-6aca-c399-58d133eaef6c, HandlerErrorCode: ServiceLimitExceeded)

Expected behavior

No response

Your project

No response

Screenshots

No response

OS

Mac

Python version

3.11.4

AWS data.all version

1.6.0

Additional context

No response

dlpzx commented 1 year ago

Hi @MaxRichter thanks for opening an issue! I have deployed data.all twice, one deployment with enable_update_dataall_stacks_in_cicd_pipeline set to true and the other set to false (for me both deployments are deployed without issue). Then, I checked the Default policy that is created for the extended CodeBuild role. CDK creates this default policy depending on the CodeBuild stages where this role is used as execution role. Since with enable_update_dataall_stacks_in_cicd_pipeline set to true we add a new CodeBuild stage that uses this role, a block like the following is added to the default policy of the role. In your deployment I am guessing that those extra lines are resulting in memory issues in the IAM policy. As it is documented, the sum size of all inline policies cannot exceed 10,240 chars per role.

        {
            "Action": [
                "logs:CreateLogGroup",
                "logs:CreateLogStream",
                "logs:PutLogEvents"
            ],
            "Resource": [
                "arn:aws:logs:eu-west-1:XXXXXXX:log-group:/aws/codebuild/dataalldelcdkpipelinePipeli-uiaNjsDAmarR",
                "arn:aws:logs:eu-west-1:XXXXXXX:log-group:/aws/codebuild/dataalldelcdkpipelinePipeli-uiaNjsDAmarR:*"
            ],
            "Effect": "Allow"
        },
        {
            "Action": [
                "codebuild:CreateReportGroup",
                "codebuild:CreateReport",
                "codebuild:UpdateReport",
                "codebuild:BatchPutTestCases",
                "codebuild:BatchPutCodeCoverages"
            ],
            "Resource": "arn:aws:codebuild:eu-west-1:XXXXXXXX:report-group/dataalldelcdkpipelinePipeli-uiaNjsDAmarR-*",
            "Effect": "Allow"
        }

I did a rough copy/paste of all inline policies for my extended role with enable_update_dataall_stacks_in_cicd_pipeline enabled and I get a policy with a size of 6,380 characters (limit 10,240 characters). Even being a high number, it seems far enough from the limit to be the cause of your issue. Is there anything that you have customized that could explain the additional permissions on the codebuild role?

Let's try to find the issue here, but if we get stuck we can organize a meeting to debug together.

MaxRichter commented 1 year ago

After cleaning up my AWS accounts and removing both the Data.All and CDK CloudFormation stacks, I redeployed from scratch and was not able to replicate the error.

My assumption is, that I did not clean-up well my AWS accounts and which is why the IAM policy got appended to and finally overflow.

In case I experience it again, I will raise another ticket.