aws / aws-sdk-java-v2

The official AWS SDK for Java - Version 2
Apache License 2.0
2.21k stars 853 forks source link

DDB Enhanced: table created without indices #1771

Closed musketyr closed 4 years ago

musketyr commented 4 years ago

Describe the Feature

DynamoDbTable.createTable() creates the table without global and local indices and there is no way how to obtain the information from TableMetadata

Is your Feature Request related to a problem?

I'm trying to create the DynamoDB table based on DDE Enhanced mappings if it does not exist but the table is created without the indices and I'm not able to get the information without any hacks.

Proposed Solution

1) minimal solution would be to expose the key set of StaticTableMetadata.indexByNameMap in the TableMetadata interface, i.g. TableMetadata.indexNames. 2) ideal solution is that DynamoDbTable.createTable() will prefill the request with the indices which has been read from the annotations.

Describe alternatives you've considered

I can use reflective access to indexByNameMap but this would be very brittle.

Additional Context

I'm trying to implement DynamoDB declarative services for Micronaut library using v2

https://agorapulse.github.io/micronaut-aws-sdk/#_declarative_services_with_service

This missing feature breaks the parity with the previous version where IDynamoDBMapper.generateCreateTableRequest() method returned the request populated with indices definitions.

Your Environment

bmaizels commented 4 years ago

Hi @musketyr ,

CreateTable requires information about your physical table that are not modeled or known by the TableSchema. Specifically, the provisioned throughput of the table and indices, although in the future there could be more. It is provided in its current form to make writing integration tests a little easier by mining some of the structural data that is already present in the TableSchema but requiring you to supplement that information as required to create the physical resource. One of the reasons that you are required to restate the names of the secondary indices is because there is no guarantee that the names assigned to them (through annotations or otherwise) in the TableSchema are actually the correct physical names of the indices given when they were created in DynamoDB because there is no contract requiring them to be. The names assigned to your secondary indices in the TableSchema are only there for the TableSchema itself to reference and disambiguate when operations within the enhanced client need to reference them.

If you are writing integration tests, we recommend you simply supply that supplemental information depending on how your integration test works, for local DynamoDB the provisioned throughput is basically ignored but a value has to be passed in, if it's running against real DynamoDB you'll want to pass in values that make sense for your application and budget.

For actual service infrastructure we do not recommend creating resources (tables) through this library, instead we recommend you use the AWS Cloud Development Kit to manage these resources, and any other resources your service runs on in one place.

musketyr commented 4 years ago

Hi @bmaizels,

thank you for the detailed answer. I just want to let you know that this goes directly against the usage we (and maybe other users of v1 API) were used to use the original IDynamoDBMapper.

One of the major advantages of IDynamoDBMapper was that it was able to generate the new table with proper table schema based on the modelled object. The only required option was (as you mentioned) to set the provisioned throughput. In general, the library I've taken over and which I now trying to switch to v2 was able to create the tables with all indices without any other input (using the default throughput).

Would you mind to ask about the option in the DynamoDB Mapper v2 issue?

bmaizels commented 4 years ago

@musketyr I think what we're really talking about here is quite a special use-case because I'm going to assume that you do the only thing which I think is possible in the v1 situation which is to blindly iterate through every secondary index that is in the generated createTable request and assign an arbitrary provisional throughput to it before submitting the request, this does effectively get you out of needing to understand what secondary indexes you have and what their names are, but it is not a use-case we anticipated.

In these situations rather than just say let's do what v1 does, I would prefer to take a step back and say what's the real problem we're trying to solve here and can we rethink it? For example, one way to tackle this would be to permit the inclusion of provisioned throughput to be specified in the TableSchema and then provide different mechanisms for actually getting that into the schema whether it be through annotation or by 'default value'. That way, the required data would then be present in the TableSchema to form a complete createTable request without the need for any supplemental data and it does not require rewriting of the request post-generation as v1 does.

Do you have any other ideas about it?

musketyr commented 4 years ago

The first thing first:

If I take a look at a fragile implementation I've made for v2 the throughput is actually not required at all. I only need the list of index descriptors. The only additional information which is currently missing is the indices' type of projection.


Now let's continue the discussion:

You're pretty close to what has been done with the indices with v1 - see the v1 implementation.

Ok, so let's picture the typical use case in BDD style:

As a Java developer with minimal knowledge of DynamoDB
When I want to store some data for my application
And I model the data as Java class
Then the DynamoDB table is created automatically based on that class (including indices)

Minimal knowledge should incorporate

On the other hand, the throughput is something which goes beyond the minimal knowledge. The real anticipated throughput can be only learned in the production and it might be information unknown to the developer at all because it's handled by DevOps person.

The ultimate goal here is to have DynamoDB table up and running in the minimal configuration capable for saving and querying entities based on the sort key and indices. This is very useful for pre-production environments - integration tests (against LocalDynamoDB/Localstack) and development environment (where you can fine-tune the throughput). The production table can be then modelled using AWS Cloud Development Kit or similar tool based on the development environment configuration as you suggested.

I'm not sure if storing throughput with the TableSchema makes sense as in that case it would expect to have the same throughput for all environments but if this the most suitable way I have nothing against it.

On the other hand, it would make sense to have complete information about the indices:

@DynamoDbBean(
    localIndices = @LocalIndex(value = "local_name", projectionType = ProjectionType.ALL),
    globalIndices = @GlobalIndex(value = "global_name", projectionType = ProjectionType.KEYS_ONLY)
)
public class MyEntity {

If populating the CreateTableRequest with the known indices is something you don't anticipate then it would be nice at least expose the list of known indices from the TableMetadata and let the library user's to decide whether they want to benefit from the information or not (see the first paragraph).

musketyr commented 4 years ago

I'm sorry if the previous comment is difficult to read - there's a lot of disturbance in the home universe so it's difficult to write a concise text.

Just to summarize:

  1. ability to generate the table with indices from the @DynamoDbEntity might not be a best practice but it is crucial for prototyping and testing
  2. if generating the indices in create table request goes against the best practices you want to anticipate, please, at least expose the existing information about the indices present in TableMetadata
  3. adding the ability to define the index's ProjectType through the annotation would be great
bmaizels commented 4 years ago

@musketyr I'm definitely leaning more towards a solution like 3 rather than a solution like 2. In order to make such a solution workable, we would start with the conceptual design of the TableSchema (and by extension TableMetadata) and figure out how to incorporate the necessary (and perhaps optional) table metadata into it. Once we had settled on that model we would then need to implement this into StaticTableMetadata (because BeanTableSchema is built on top of it), and then finally add annotation support for the BeanTableSchema. Once all that was done, it should then be trivial to modify createTable() to make use of that metadata if it were available.

We will have to backlog and prioritize this feature against the other features we want to release for this library (such as the Document API). For now the workaround will have to be to provide your own metadata with extended index information for each table you model in your integration test suite and construct the CreateTableEnhancedRequest and passing in that additional information.

bmaizels commented 4 years ago

@musketyr on reflection I think it's reasonable to expect to be able to get out of TableMetadata what you put into it, so we will also plan to add a feature to be able to access the index names. This will be much cheaper than the other solution, however I don't want to lose sight of the other solution as really being the 'correct' thing here.

musketyr commented 4 years ago

@bmaizels thank you for your response.

in the long term, I plan to replace BeanTableSchema with a reflection-free version using Micronaut's BeanIntrospector so then I can think about the custom metadata from the custom annotations.

for now, it would be great if the indices information is exposed from TableMetadata.

bmaizels commented 4 years ago

@musketyr Sorry I went quiet on this, so much to work on so little time! An interesting and unexpected development is as I was developing a POC for supporting immutable data objects I ended up doing most of the things you are asking for here (I think). Check it out.... https://github.com/aws/aws-sdk-java-v2/issues/1801

musketyr commented 4 years ago

Cool. No worries I was able to work around all the issues but having first-class support will be much better!

I can see the indices exposed now here which is great!

I had to add the projection type support into our library as defaulting to ProjectionType.KEYS_ONLY was not always ideal.

Would be nice if this can be a part of the Index interface of the library but I can live without it :-)

I have one more question regarding immutability and indices. I often end up with derived attributes for the indices which are composed of other attributes - in that case, it would be great if there is an option for read-only attributes which does not require a setter.

    @JsonIgnore
    @Projection(ProjectionType.ALL)
    @DynamoDbSecondarySortKey(indexNames = ORG_BY_READ_AND_TIMESTAMP_INDEX)
    public String getOrganizationByReadAndDateIndex() {
        return String.format("%s::%s::%s", organizationUid, read ? '1' : '0', timestamp);
    }

    public void setOrganizationByReadAndDateIndex(String ignored) {
        // ignored - derived property
    }

Would be nice to have @DynamoDbReadOnly or @DynamoDbDerived annotation to signal no need for the setter.

bmaizels commented 4 years ago

TableMetadata now has index information available as per https://github.com/aws/aws-sdk-java-v2/pull/2012

musketyr commented 4 years ago

Thanks!

djake commented 3 years ago
As a Java developer with minimal knowledge of DynamoDB
When I want to store some data for my application
And I model the data as Java class
Then the DynamoDB table is created automatically based on that class (including indices)

That's me! Sorry to hijack but this issue is the only thing I can find approaching my problem... given my TableSchema (which does appear to include index information), how can I go about creating the table with the secondary indexes (with the enhanced client)?

I'll add that aside from the novice use case it does generally seem that you might want to automate spinning up environments or something of the sort, in which case it would be useful to re-use the model. I've seen https://github.com/awsdocs/aws-doc-sdk-examples/blob/master/javav2/example_code/dynamodb/src/main/java/com/example/dynamodb/CreateTableCompositeKey.java but it seems that the index information is available on my decorated class so I should be able to use it, perhaps by constructing my own CreateTableEnhancedRequest, but I can figure out how to tie it together.

Update: I did something like the following. Does seem like createTable could/should be able to do this automatically?

    slackacktable = dynamoClient.table("SlackAckJava11", schema);
            Projection toUserSecondaryIdxProject = Projection.builder().projectionType("ALL").build();
            EnhancedGlobalSecondaryIndex toUserIdx = EnhancedGlobalSecondaryIndex.builder()
                    .indexName("tousersecondaryindex")
                    .projection(toUserSecondaryIdxProject)
                    .build();
            CreateTableEnhancedRequest req = CreateTableEnhancedRequest.builder().globalSecondaryIndices(toUserIdx).build();
            slackacktable.createTable(req);
oharaandrew314 commented 3 years ago

@djake It definitely sucks that the sdk just ignores the indices defined in the schemas. From my understanding of the back and forth in this issue: in order to create indices, you need to know the projection and provisioned throughput, which isn't available in the TableSchema. Arguably, this information doesn't make sense to include in the schema for the rare scenario where you use it to create a table, so that's why you need that ugly CreateTableEnhancedRequest to enable it.

Personally, I use the mappers to create tables all the time for my automated tests, so this missing feature is huge downer for me. I've been working on a plugin to make the v2 enhanced sdk palatable for kotlin data classes, and I made this helper method that seems to work well enough for my mock dynamo environment.

fun <T: Any> DynamoDbTable<T>.createTableWithIndices() {
    val metadata = tableSchema().tableMetadata()

    val globalIndices = metadata.indices()
        .filter { it.partitionKey().get().name() != metadata.primaryPartitionKey() }
        .map { index ->
            EnhancedGlobalSecondaryIndex.builder()
                .indexName(index.name())
                .projection(Projection.builder().projectionType(ProjectionType.ALL).build())
                .build()
        }

    val localIndices = metadata.indices()
        .filter { it.partitionKey().get().name() == metadata.primaryPartitionKey() && it.name() != TableMetadata.primaryIndexName() }
        .map { index ->
            EnhancedLocalSecondaryIndex.builder()
                .indexName(index.name())
                .projection(Projection.builder().projectionType(ProjectionType.ALL).build())
                .build()
        }

    val request = CreateTableEnhancedRequest.builder()
        .globalSecondaryIndices(globalIndices)
        .localSecondaryIndices(localIndices)
        .build()

    createTable(request)
}
sullrich84 commented 2 years ago

Improved @oharaandrew314 suggestion together with @denniseffing to also check if local/global secondary indices exists before creating:

fun DynamoDbAsyncTable<*>.createTableWithIndices(): CompletableFuture<Void> {
    val metadata = tableSchema().tableMetadata()

    val globalIndices = metadata.indices()
        .filter { it.partitionKey().get().name() != metadata.primaryPartitionKey() }
        .map { index ->
            EnhancedGlobalSecondaryIndex.builder()
                .indexName(index.name())
                .projection(Projection.builder().projectionType(ProjectionType.ALL).build())
                .build()
        }

    val localIndices = metadata.indices()
        .filter {
            it.partitionKey().get()
                .name() == metadata.primaryPartitionKey() && it.name() != TableMetadata.primaryIndexName()
        }
        .map { index ->
            EnhancedLocalSecondaryIndex.builder()
                .indexName(index.name())
                .projection(Projection.builder().projectionType(ProjectionType.ALL).build())
                .build()
        }

    val request = CreateTableEnhancedRequest.builder()
        .apply {
            if (localIndices.isNotEmpty()) localSecondaryIndices(localIndices)
            if (globalIndices.isNotEmpty()) globalSecondaryIndices(globalIndices)
        }
        .build()

    return createTable(request)
}
chrylis commented 2 years ago

I will bump this once again to note that this behavior is extremely Surprising in the bad way. If I can model the index with annotations, and I can use createTable to create the table, I expect the entire table to be created as specified. Silently ignoring schema information until query time makes for a baffling user experience.

bmaizels commented 2 years ago

Hearing all the feedback. I just want to restate the basis of the problem in an abstract way because the solution can't be as simple as 'just use the schema data you have'.

The TableSchema object is currently designed to be a 'data-plane' type interface that is designed just to work with the data in the resources that have already been provisioned. In practical terms this means that it is missing all the meta-data required to constitute an actual table resource in DynamoDB, specifically it is missing the real names of the indices, the provisioned throughput settings and the projected attributes. These things are not important or relevant for the TableSchema as a data-plane interface object so they were never included. That being said, there is clearly a QOL issue here where there is a common use-case that people want to create tables from these objects for the purposes of testing, so it makes sense to figure out how to make that easier. The ideas I have about it fall into two categories:

  1. Allow the additional control-plane metadata to be added to the TableSchema optionally. If that information is present then table.createTable() will work, otherwise it will not.
  2. Make it easier to create the basis of a createTable request from a TableSchema and just add default provisioned throughput settings and projection settings rather like the implementation @sullrich84 and @oharaandrew314 has suggested.
chrylis commented 2 years ago

FWIW, it's not just for testing: I've been using the Enhanced Client to create tables in production, which succeeds even after you define a secondary index... and then fails delayed at runtime when you query it, since it wasn't created in AWS.

LeeroyHannigan commented 2 years ago

Adding my solution to this issue which may be easier to read for some:

        CreateTableEnhancedRequest createTableEnhancedRequest = CreateTableEnhancedRequest.builder()
                .provisionedThroughput(
                        ProvisionedThroughput.builder()
                                .readCapacityUnits(10L)
                                .writeCapacityUnits(10L)
                                .build())
                .globalSecondaryIndices(
                        EnhancedGlobalSecondaryIndex.builder()
                                .indexName("my-gsi")
                                .projection(p -> p.projectionType(ProjectionType.KEYS_ONLY))
                                .provisionedThroughput(ProvisionedThroughput.builder()
                                        .readCapacityUnits(10L)
                                        .writeCapacityUnits(10L)
                                        .build())
                                .build())
                .build();

        mappedTable.createTable(createTableEnhancedRequest);