aws / aws-cdk

The AWS Cloud Development Kit is a framework for defining cloud infrastructure in code
https://aws.amazon.com/cdk
Apache License 2.0
11.38k stars 3.78k forks source link

(aws-dynamodb): Feature gap regarding global DynamoDB Tables with per-region streams #16179

Open ralphmr opened 2 years ago

ralphmr commented 2 years ago

:question: Global DynamoDB Tables with Multi Region Streams

The Question

I am using the AWS CDK (Python) to create a DynamoDB global table.

globaltable = ddb.Table(
        self, 'GlobalTable',
        partition_key={'name': 'name', 'type': ddb.AttributeType.STRING},
        replication_regions=['ap-southeast-2'],
        stream=ddb.StreamViewType.NEW_AND_OLD_IMAGES,
        billing_mode=ddb.BillingMode.PAY_PER_REQUEST,
)

This is a global table that is launched in a stack in eu-west-1. As you can see, the table is replicated across to ap-southeast-2.

I want to create another stack in ap-southeast-2 that references the DynamoDB replica table local to that region. In this stack, I need to use the DynamoDB streams for the table replica to trigger a Lambda function that also exists in that region.

To create the Lambda event source in the secondary region, I need to get the table from its attributes in the second stack

 globaltable = ddb.Table.from_table_attributes(
      self, 
      'GlobalTable', 
      table_name=global_table_primary.table_name,
      table_stream_arn = '???'
  )

From the DynamoDB CDK API docs:

If you intend to use the tableStreamArn (including indirectly, for example by creating an @aws-cdk/aws-lambda-event-source.DynamoEventSource on the imported table), you must use the Table.fromTableAttributes method and the tableStreamArn property must be populated.

So it appears that you must provide the ARN during the fromTableAttributes call to import the foreign resource. Failing to do so gives the following error:

jsii.errors.JSIIError: DynamoDB Streams must be enabled on the table GlobalServerlessSecondaryStack/GlobalTable

Of course, I don't know this since it is not a predictable ARN. I can easily get the ARN of the stream in the primary region using table.tableStreamArn, but there doesn't seem to be an option to get the stream ARN for the replica.

Environment

Other information

skinny85 commented 2 years ago

Thanks for opening the issue @ralphmr. But I'm not sure there's much we can do here šŸ˜• . Looks like there's no easy way to get the ARN of the Stream in the replicas in the other regions. Even the new AWS::DynamoDB::GlobalTable resource doesn't allow that - from its docs:

StreamArn The ARN of the DynamoDB stream, such as arn:aws:dynamodb:us-east-1:123456789012:table/testddbstack-myDynamoDBTable-012A1SL7SMP5Q/stream/2015-11-30T20:10:00.000. The StreamArn returned is that of the replica in the region the stack is deployed to.

ralphmr commented 2 years ago

Thank you for the response and for adding labels to the issue @skinny85.

I'm glad you have added the feature-request label to this, I believe there is a strong use case for this feature to be supported, especially as we see a shift to global active-active architectures.

As a work-around, I am contemplating using two stacks. The first stack would launch everything in the first region, and the global DynamoDB table. The user will then manually go and find the table name and get the stream ARN from the secondary region. The second stack will take this information as a parameter, and will launch the remainder of the architecture from there. Do you see any issues with this approach?

skinny85 commented 2 years ago

No, I think it's a good way to handle this unfortunate limitation.

ralphmr commented 2 years ago

Hi CDK team, I have had lots of people reach out to me regarding this saying they experience the same issue. Is there any update on whether this feature gap may be closed in the future?

My workaround

I am documenting my workaround in more detail here for the benefit of others who require this feature.

The beginning of the CDK for the second stack contains the following code to get the required inputs from the user:

# Instructions:
# To deploy with these parameters, run
# cdk deploy --parameters globalTableName=xyz --parameters globalStreamARN=xyz

global_table_name =core.CfnParameter(self, "globalTableName", type="String", 
            description="The name of the global table created with the previous stack")

 global_table_stream_arn =core.CfnParameter(self, "globalStreamARN", type="String", 
            description="The ARN of the stream in the secondary region")

I can then reference the same global table in the secondary stack using this information:

globaltable = ddb.Table.from_table_attributes(
            self, 
            'GlobalTable', 
            table_name=global_table_name.value_as_string,
            table_stream_arn = global_table_stream_arn.value_as_string
)
skinny85 commented 2 years ago

@ralphmr we don't have immediate plans to work on this feature. PRs are always welcome.

justin8 commented 2 years ago

You could also use a custom resource to get the table ARN in the secondary region, making it potentially a bit more portable, e.g. lookup via name/tags/partial-match in the replica region without needing to manually call out and get the parameter or requiring lookup permissions during synthesis.

alhaiz313 commented 2 years ago

Can you please prioritise this issue? It's really needed and expected.

peterwoodworth commented 2 years ago

We could potentially support this through a custom resource, but I also recommend creating an issue in the CloudFormation coverage roadmap so that we may have more direct support of this feature.