aws / aws-cdk

The AWS Cloud Development Kit is a framework for defining cloud infrastructure in code
https://aws.amazon.com/cdk
Apache License 2.0
11.55k stars 3.87k forks source link

DynamoDB: can't change a Table from "Pay Per Request" to "Provisioned" #15318

Open ronakkenia opened 3 years ago

ronakkenia commented 3 years ago

description of the bug:

I created a DynamoDB table via CDK in the past with the on-demand capacity and now want to change it to be of type provisioned. I added the CDK TypeScript code to change the billing mode to be provisioned and to add auto-scaling write and read capacity and when trying to deploy this, I get the following error:

4:15:27 PM | UPDATE_FAILED        | AWS::DynamoDB::Table                        | PrivateTableName1234
The provisioned throughput for the table will not change.
The requested value equals the current value.
Current ReadCapacityUnits provisioned for the table: 5.
Requested ReadCapacityUnits: 5.
Current WriteCapacityUnits provisioned for the table: 5.
Requested WriteCapacityUnits: 5.
Refer to the Amazon DynamoDB Developer Guide for current limits and how to request higher limits.
(Service: AmazonDynamoDBv2; Status Code: 400; Error Code: ValidationException; Request ID: ###; Proxy: null)

    new Table (/private_path/node_modules/monocdk/lib/aws-dynamodb/lib/table.js:456:22)
    \_ new DynamoResourceBundle (/private_path/dist/lib/globalResources/dynamoResourceBundle.js:19:42)
    \_ new GlobalStack (/private_path/dist/lib/stack/globalStack.js:14:38)
    \_ addGlobalStack (/private_path/dist/lib/app.js:66:25)
    \_ /private_path/dist/lib/app.js:37:16
    \_ Array.map (<anonymous>)
    \_ /private_path/dist/lib/app.js:36:55
    \_ Array.forEach (<anonymous>)
    \_ Object.<anonymous> (/private_path/dist/lib/app.js:34:29)
    \_ Module._compile (internal/modules/cjs/loader.js:1085:14)
    \_ Object.Module._extensions..js (internal/modules/cjs/loader.js:1114:10)
    \_ Module.load (internal/modules/cjs/loader.js:950:32)
    \_ Function.Module._load (internal/modules/cjs/loader.js:790:14)
    \_ Function.executeUserEntryPoint [as runMain] (internal/modules/run_main.js:76:12)
    \_ internal/main/run_main_module.js:17:47

 ❌  My-Stack-Name failed: Error: The stack named My-Stack-Name failed to deploy: UPDATE_ROLLBACK_COMPLETE
    at Object.waitForStackDeploy (/another_private_path/build/private/cdk-cli/node_modules/aws-cdk/lib/api/util/cloudformation.ts:307:11)
    at processTicksAndRejections (internal/process/task_queues.js:95:5)
    at Object.deployStack (/another_private_path/build/private/cdk-cli/node_modules/aws-cdk/lib/api/deploy-stack.ts:294:26)
    at CdkToolkit.deploy (/another_private_path/build/private/cdk-cli/node_modules/aws-cdk/lib/cdk-toolkit.ts:180:24)
    at initCommandLine (/another_private_path/build/private/cdk-cli/node_modules/aws-cdk/bin/cdk.ts:212:9)
The stack named My-Stack-Name failed to deploy: UPDATE_ROLLBACK_COMPLETE
 ›   Error: Failed to run application build system.

                        BUILD FAILED

  *** command 'cdk-build' with arguments 'cdk deploy My-Stack-Name' exited with return code '1'

Reproduction Steps

Using the following code:

const table = new Table(stack, tableName, {
            partitionKey: {name: partitionKey, type: AttributeType.STRING},
            sortKey: {name: sortKey, type: AttributeType.STRING},
            tableName: tableName,
            pointInTimeRecovery: true,
            replicationRegions: [AwsRegion.PDX, AwsRegion.DUB],
            billingMode: BillingMode.PROVISIONED
        });
        table.autoScaleWriteCapacity({
            minCapacity: 5,
            maxCapacity: 10
        }).scaleOnUtilization({targetUtilizationPercent: 75});
        table.autoScaleReadCapacity({
            minCapacity: 5,
            maxCapacity: 10
        }).scaleOnUtilization({targetUtilizationPercent: 75});

And running the following command:

cdk deploy My-Stack-Name

What did you expect to happen?

The table to be updated to be billing type provisioned and to have the corresponding min/max read/write auto-scaling units.

What actually happened?

The process throws the error listed above and:

Environment

Other

I found a similar issue in the past so I tried to make sure I was using all the most updated versions of everything I could to include any bugfixes that were already merged in for this issue in the past: https://github.com/crossplane/provider-aws/issues/464


This is :bug: Bug Report

skinny85 commented 3 years ago

Hey @ronakkenia,

that's a weird error 🤔.

Can you try setting the readCapacity and writeCapacity properties of Table, to something like 6, and see if that fixes the problem?

Thanks, Adam

ronakkenia commented 3 years ago

Hey @skinny85, thanks for the quick reply. I updated the read and write capacity to both be 6 and oddly enough I get the same error:

Changed code to (git diff):

             pointInTimeRecovery: true,
-            replicationRegions: [AwsRegion.PDX, AwsRegion.DUB]
+            replicationRegions: [AwsRegion.PDX, AwsRegion.DUB],
+            billingMode: BillingMode.PROVISIONED
         });
+        table.autoScaleWriteCapacity({
+            minCapacity: 6,
+            maxCapacity: 6
+        }).scaleOnUtilization({targetUtilizationPercent: 75});
+        table.autoScaleReadCapacity({
+            minCapacity: 6,
+            maxCapacity: 6
+        }).scaleOnUtilization({targetUtilizationPercent: 75});

Error

My-Stack-Name: deploying...
My-Stack-Name: creating CloudFormation changeset...
2:48:24 PM | UPDATE_FAILED        | AWS::DynamoDB::Table                        | PrivateTableName1234
The provisioned throughput for the table will not change.
The requested value equals the current value.
Current ReadCapacityUnits provisioned for the table: 5.
Requested ReadCapacityUnits: 5.
Current WriteCapacityUnits provisioned for the table: 5.
Requested WriteCapacityUnits: 5.
Refer to the Amazon DynamoDB Developer Guide for current limits and how to request higher limits.
(Service: AmazonDynamoDBv2; Status Code: 400; Error Code: ValidationException; Request ID: 123###; Proxy: null)

After this, the AWS console shows the DynamoDB to be unchanged. Let me know if there are any other things I can try or other information you would need, thanks!

skinny85 commented 3 years ago

@ronakkenia like I wrote, can you try doing this on the Table? Not in autoScaleReadCapacity() / autoScaleWriteCapacity().

Thanks, Adam

ronakkenia commented 3 years ago

Apologies, I misunderstood/misread. I tried what you actually meant and the CDK command was taking a very long time and then eventually failed with the following (timeout?) error:

Changed code to:

const table = new Table(stack, tableName, {
    partitionKey: {name: partitionKey, type: AttributeType.STRING},
    sortKey: {name: sortKey, type: AttributeType.STRING},
    tableName: tableName,
    pointInTimeRecovery: true,
    replicationRegions: [AwsRegion.PDX, AwsRegion.DUB],
    billingMode: BillingMode.PROVISIONED,
    readCapacity: 6,
    writeCapacity: 6
});
table.autoScaleWriteCapacity({
    minCapacity: 5,
    maxCapacity: 10
}).scaleOnUtilization({targetUtilizationPercent: 75});
table.autoScaleReadCapacity({
    minCapacity: 5,
    maxCapacity: 10
}).scaleOnUtilization({targetUtilizationPercent: 75});

Error message:

Running cdk with args: deploy,My-Stack-Name
My-Stack-Name: deploying...
My-Stack-Name: creating CloudFormation changeset...
[████████▎·················································] (2/14)

6:16:15 PM | UPDATE_IN_PROGRESS   | AWS::CloudFormation::Stack                  | My-Stack-Name
6:16:40 PM | UPDATE_IN_PROGRESS   | AWS::DynamoDB::Table                        | My-Stack-Name/PrivateTableName1234

Error occurred while monitoring stack: ExpiredToken: The security token included in the request is expired
    at Request.extractError (/private_path/build/private/cdk-cli/node_modules/aws-cdk/node_modules/aws-sdk/lib/protocol/query.js:50:29)
    at Request.callListeners (/private_path/build/private/cdk-cli/node_modules/aws-cdk/node_modules/aws-sdk/lib/sequential_executor.js:106:20)
    at Request.emit (/private_path/build/private/cdk-cli/node_modules/aws-cdk/node_modules/aws-sdk/lib/sequential_executor.js:78:10)
    at Request.emit (/private_path/build/private/cdk-cli/node_modules/aws-cdk/node_modules/aws-sdk/lib/request.js:688:14)
    at Request.transition (/private_path/build/private/cdk-cli/node_modules/aws-cdk/node_modules/aws-sdk/lib/request.js:22:10)
    at AcceptorStateMachine.runTo (/private_path/build/private/cdk-cli/node_modules/aws-cdk/node_modules/aws-sdk/lib/state_machine.js:14:12)
    at /private_path/build/private/cdk-cli/node_modules/aws-cdk/node_modules/aws-sdk/lib/state_machine.js:26:10
    at Request.<anonymous> (/private_path/build/private/cdk-cli/node_modules/aws-cdk/node_modules/aws-sdk/lib/request.js:38:9)
    at Request.<anonymous> (/private_path/build/private/cdk-cli/node_modules/aws-cdk/node_modules/aws-sdk/lib/request.js:690:12)
    at Request.callListeners (/private_path/build/private/cdk-cli/node_modules/aws-cdk/node_modules/aws-sdk/lib/sequential_executor.js:116:18) {
  code: 'ExpiredToken',
  time: 2021-06-28T18:40:07.662Z,
  requestId: '123###',
  statusCode: 403,
  retryable: true
}

The Dynamo table in the AWS console looks unchanged compared to previous update attempts and when I try to deploy again to my stack I'm currently in a blocked state because I get an error saying:

 ❌  My-Stack-Name failed: Error [ValidationError]: My-Stack-Name/1234 is in UPDATE_ROLLBACK_FAILED state and can not be updated.

And when I try to re-try the update rollback via the AWS console I get the following error for the Dynamo table that this issue is for:

Subscriber limit exceeded: Update to PayPerRequest mode are limited to once in 1 day(s). Last update at Mon Jun 28 18:12:29 UTC 2021. Next update can be made at Tue Jun 29 18:12:29 UTC 2021 (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: LimitExceededException;

If needed, I can try the suggestion of using set readCapacity and writeCapacity again tomorrow after getting the stack back in a stable state.

skinny85 commented 3 years ago

Yeah, if you could. I don't think there's much CDK work here - this looks like some peculiarities in the DynamoDB CloudFormation support.

ronakkenia commented 3 years ago

I was able to retry after fixing the stack drift and finishing the rollback via the console so it was back in the original state. It threw a different kind of error now saying something along the lines of "the autoscaling targets already exist"

The following updated in the AWS console:

This was the error message from the command line:

My-Stack-Name: deploying...
My-Stack-Name: creating CloudFormation changeset...
6:38:48 PM | CREATE_FAILED        | AWS::ApplicationAutoScaling::ScalableTarget | PrivateTabl...lingTarget80726324
table/PrivateTableName1234|dynamodb:table:ReadCapacityUnits|dynamodb already exists

    new ScalableTarget (/private_path/node_modules/monocdk/lib/aws-applicationautoscaling/lib/s PM | UPDATE_ROLLBACK_IN_P | AWS::CloudFormation::Stack                  | My-Stack-Name
calable-target.js:41:26)
    \_ new BaseScalableAttribute (/private_path/node_modules/monocdk/lib/aws-applicationautoscaMappingbetaSourceTableAttachedManagedPolicyPipelineGlobalbetaawscdkawsdynamodbReplicaProviderOnEventHandlerServiceRole1234ABCD].
ling/lib/base-scalable-attribute.js:34:23)
    \_ new ScalableTableAttribute (/private_path/node_modules/monocdk/lib/aws-dynamodb/lib/scal0 18:23:34 UTC 2021 (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: LimitExceededException; Request ID: 1234####
able-table-attribute.js:10:9)
    \_ Table.autoScaleReadCapacity (/private_path/node_modules/monocdk/lib/aws-dynamodb/lib/tablowing resource(s) failed to update: [PrivateTableName1234, awscdkawsdynamodbReplicaProviderNestedStackawscdkawsdynamodbReplicaProviderNest
le.js:645:58)
    \_ new DynamoResourceBundle (/private_path/dist/lib/globalResources/dynamoResourceBundle.js
:33:34)
    \_ new GlobalStack (/private_path/dist/lib/stack/globalStack.js:14:38)
    \_ addGlobalStack (/private_path/dist/lib/app.js:66:25)
    \_ /private_path/dist/lib/app.js:37:16
    \_ Array.map (<anonymous>)
    \_ /private_path/dist/lib/app.js:36:55
    \_ Array.forEach (<anonymous>)
    \_ Object.<anonymous> (/private_path/dist/lib/app.js:34:29)
    \_ Module._compile (internal/modules/cjs/loader.js:1085:14)
    \_ Object.Module._extensions..js (internal/modules/cjs/loader.js:1114:10)
    \_ Module.load (internal/modules/cjs/loader.js:950:32)
    \_ Function.Module._load (internal/modules/cjs/loader.js:790:14)
    \_ Function.executeUserEntryPoint [as runMain] (internal/modules/run_main.js:76:12)
    \_ internal/main/run_main_module.js:17:47

-- there was another stack trace here saying that it failed to rollback the stack again because it couldn't switch the table to on-demand mode more than once a day --

                        BUILD FAILED

  *** command 'cdk-build' with arguments 'cdk deploy My-Stack-Name' exited with return code '1'

I found some links online pointing to there being existing auto-scaling targets that could be causing issues, but when I run the following command:

 aws --profile my-profile-name application-autoscaling describe-scalable-targets --service-namespace dynamodb --region us-east-1

I get the following output which looks like the default auto-scaling targets that were made from the table creation code so I am not sure if that's the issue:

{
    "ScalableTargets": [
        {
            "ServiceNamespace": "dynamodb",
            "ResourceId": "table/PrivateTableName1234",
            "ScalableDimension": "dynamodb:table:WriteCapacityUnits",
            "MinCapacity": 6,
            "MaxCapacity": 40000,
            "RoleARN": "arn:aws:iam::aws_profile_id:role/aws-service-role/dynamodb.application-autoscaling.amazonaws.com/AWSServiceRoleForApplicationAutoScaling_DynamoDBTable",
            "CreationTime": "2021-06-29T18:38:26.908000+00:00",
            "SuspendedState": {
                "DynamicScalingInSuspended": false,
                "DynamicScalingOutSuspended": false,
                "ScheduledScalingSuspended": false
            }
        },
        {
            "ServiceNamespace": "dynamodb",
            "ResourceId": "table/PrivateTableName1234",
            "ScalableDimension": "dynamodb:table:ReadCapacityUnits",
            "MinCapacity": 6,
            "MaxCapacity": 40000,
            "RoleARN": "arn:aws:iam::aws_profile_id:role/aws-service-role/dynamodb.application-autoscaling.amazonaws.com/AWSServiceRoleForApplicationAutoScaling_DynamoDBTable",
            "CreationTime": "2021-06-29T18:38:25.495000+00:00",
            "SuspendedState": {
                "DynamicScalingInSuspended": false,
                "DynamicScalingOutSuspended": false,
                "ScheduledScalingSuspended": false
            }
        }
    ]
}

Let me know if there is anything else to try or if you think I should bring it up with DynamoDB CloudFormation Support. Thanks!

skinny85 commented 3 years ago

Yeah, sorry. You can try more experiments, but I honestly don't think the CDK is doing anything weird here - it looks like the DynamoDB CloudFormation support has some limitations as far as changing the billingMode of a Table.

Sorry for the bad experience, but I can't see what CDK can do differently here 😕.

skinny85 commented 3 years ago

Same thing experienced by @DennisSSDev in https://github.com/aws/aws-cdk/issues/16302.

@DennisSSDev any ideas what CDK can do here? Would a Custom Resource that can perform AWS API calls possibly help here?

jumic commented 3 years ago

Does this issue only occur if replicationRegions or pointInTimeRecovery is set? Maybe this information is helpful to solve this issue.

I did a similar test with without replicationRegions and pointInTimeRecovery

First, I deployed my example stack with this table.

const table = new Table(this, 'MyTable', {
  partitionKey: { name: 'id', type: AttributeType.STRING },
  billingMode: BillingMode.PAY_PER_REQUEST,
});

In the AWS Console, I checked the DynamoDB Table: Capacity mode: On-demand

Next, I changed my stack to provisioned billing mode and deployed the stack again (update):

const table = new Table(this, 'MyTable', {
  partitionKey: { name: 'id', type: AttributeType.STRING },
  billingMode: BillingMode.PROVISIONED,
});
table.autoScaleWriteCapacity({
  minCapacity: 6,
  maxCapacity: 11
}).scaleOnUtilization({ targetUtilizationPercent: 75 });
table.autoScaleReadCapacity({
  minCapacity: 7,
  maxCapacity: 12,
}).scale

The stack will be deployed successfully. In the AWS Console, I can see Capacity mode: Provisioned.

The autoscaling information are available, too. image

mrgarcia1998 commented 2 years ago

I was able to work around this issue by using the CfnGlobalTable class rather than the Table class.

Ensure your table's RemovalPolicy is set to RETAINED so it doesn't get deleted, remove the Table object from your CDK code, and deploy the changes. This will remove the table from the Cloudformation template but NOT from your account. Now, add back your table in CDK but instead as a CfnGlobalTable object rather than a Table object. Ensure everything is the same (table name, BillingMode settings, etc.) except remove any global replicas for now, and build the code. Grab the template file when you build the CDK code and import this template into Cloudformation using the AWS console. This process will find the existing table that is still in your account and add it back as a GlobalTable.

Go back to your CDK code and add any changes you needed to, as now the CfnGlobalTable is much easier to work with. If your table has global replicas, add them back one at a time per build in CDK as Cloudformation doesn't support adding more than one region at a time per deployment. Now all future changes you need to make in CDK are easier to make and deploy to Cloudformation.

ronakkenia commented 2 years ago

Hey all, yeah as @mrgarcia1998 wrote, that is the workaround that we and a partner team (repeatable workaround 😄 ) ended up using to get around this issue. Since the problem still exists, I'll leave the GitHub issue open. If the project owners see fit, feel free to close it out, but the issue is handled on our end.

grant-d commented 2 years ago
8:13:23 PM | CREATE_FAILED        | AWS::ApplicationAutoScaling::ScalableTarget | StrataTableReadScalingTarget7882D8FD
table/<table-name>|dynamodb:table:ReadCapacityUnits|dynamodb already exists

8:13:23 PM | CREATE_FAILED        | AWS::ApplicationAutoScaling::ScalableTarget | StrataTableWriteScalingTargetEA102FCA
table/<table-name>|dynamodb:table:WriteCapacityUnits|dynamodb already exists

I tried so many things and eventually I think this is what solved the problem on one table (without reverting entirely to CfnGlobalTable) but not (yet) another. The difference between the 2 is that the 2nd has replicas, but I am also waiting for the obligatory 24 hours to pass so I can test this method on it too. [Edit] Ignore, this wasn't what solved it. The main issue is still present on the 2nd table.

const strataTable = new ddb.Table(this, billingMode: 'PROVISIONED', 'Table', { ... }`)
// HACK:
const node = this.table.node.defaultChild as ddb.CfnTable
node.billingMode = 'PROVISIONED'
skinny85 commented 2 years ago

Perhaps the following is the source of the problem. This line removes the BillingMode in the case that it is PROVISIONED

https://github.com/aws/aws-cdk/blob/b77787825e9de25ff91a784a7da486d921924110/packages/%40aws-cdk/aws-dynamodb/lib/table.ts#L1195

Could be. It should be a very easy change, just making that line:

      billingMode: this.billingMode,
grant-d commented 2 years ago

I was able to work around this issue by using the CfnGlobalTable class rather than the Table class.

None of my proposals above worked. I eventually had to do this too.

rix0rrr commented 1 year ago

This issue was for the existing Table construct, which used custom resources to implement table replication. We no longer recommend the use of the Table construct.

Instead, the TableV2 construct has been released in 2.95.1 (#27023) which maps to the AWS::DynamoDB::GlobalTable resource, has better support for replication and does not suffer from the issue described here.


Be aware that there are additional deployment steps involved in a migration from Table to TableV2. You need to do a RETAIN deployment, a delete deployment, then change the code to use TableV2 and then use cdk import. A link to a full guide will be posted once it is available.

Here are some other resources to get you started (using CfnGlobalTable instead of TableV2) if you want to get going on the migration: