aws / aws-sdk-java-v2

The official AWS SDK for Java - Version 2
Apache License 2.0
2.21k stars 853 forks source link

DynamoDB Enhanced Client Polymorphic Types #1870

Open millems opened 4 years ago

millems commented 4 years ago

We should support reading/writing base types using the DynamoDB Enhanced Client.

Potential syntax/example (not yet implemented):

@DynamoDbBean
@DynamoDbSubtypes({Employee.class, Customer.class})
public abstract class Person {
    @DynamoDBHashKey
    private long id;
    private String name;
}

@DynamoDbBean
private class Employee extends Person {
}

@DynamoDbBean
private class Customer extends Person {
}

DynamoDbEnhancedClient client = DynamoDbEnhancedClient.create();
DynamoDbTable<Person> people = client.table("people", TableSchema.fromBean(Person.class));

Employee bob = new Employee();
bob.setId(1);
bob.setName("Bob the Builder");

Customer lfh = new Customer();
lfh.setId(2);
lfh.setName("Low Flying Hawk");

people.putItem(bob);
people.putItem(lfh);

assertThat(people.getItem(Key.builder().partitionValue(1).build())).isInstanceOf(Employee.class);
assertThat(people.getItem(Key.builder().partitionValue(2).build())).isInstanceOf(Customer.class);
gakinson commented 4 years ago

This would be super useful!

bmaizels commented 4 years ago

Started taking a look at designing this. If we were to implement the proposal above as it's written we'd need to insert our own type metadata into the attribute map before storing it in the DDB table so when we read it back out we had something there that would tell us if it's an Employee or a Customer. Personally I think I prefer the idea of having an explicit property on the base class that can be used to store this information that is fully under the control of the application. My idea goes something like this:

@DynamoDbPolymorphic
public abstract class Animal {
    private final String animalAttribute;

    protected Animal(Builder b) {
        this.animalAttribute = b.animalAttribute;
    }

    @DynamoDbSubTypeAttribute({
        @SubType(propertyValue = "CAT", subType = Cat.class),
        @SubType(propertyValue = "DOG", subType = Dog.class)})
    public abstract Species species();

    public String animalAttribute() {
        return this.animalAttribute;
    }

    public static abstract class Builder {
        private String animalAttribute;

        public Builder color(String color) {
            this.animalAttribute = color;
            return this;
        }
    }
}

Implementations of Animal would follow the normal DynamoDb annotated class pattern (in this case they would most likely be @DynamoDbImmutable) and could make valid TableSchema by themselves.

I see this issue had some thumbs up, so it's good to see people are interested in this. Any notions or bias of how you'd like to see us implement it?

bmaizels commented 4 years ago

Here's an alternative idea that's closer to the original proposal. In this case the only difference from that proposal is we're requiring the application to explicitly designate the name of the string attribute in the dynamoDb record that will be used to store the type information. This information will not be unmarshalled into any properties and does not require a property to actually exist that models it.

@DynamoDbSubtypes(dynamoDbAttribute = species", subtypes = { 
    @Subtype(attributeValue = "CAT", subType = Cat.class),
    @Subtype(attributeValue = "DOG", subType = Dog.class)})
public abstract class Animal {
    private final String animalAttribute;

    protected Animal(Builder b) {
        this.animalAttribute = b.animalAttribute;
    }

    public String animalAttribute() {
        return this.animalAttribute;
    }

    public static abstract class Builder {
        private String animalAttribute;

        public Builder color(String color) {
            this.animalAttribute = color;
            return this;
        }
    }
}
gakinson commented 4 years ago

Here's an alternative idea that's closer to the original proposal. In this case the only difference from that proposal is we're requiring the application to explicitly designate the name of the string attribute in the dynamoDb record that will be used to store the type information. This information will not be unmarshalled into any properties and does not require a property to actually exist that models it.

@DynamoDbSubtypes(dynamoDbAttribute = species", subtypes = { 
    @Subtype(attributeValue = "CAT", subType = Cat.class),
    @Subtype(attributeValue = "DOG", subType = Dog.class)})
public abstract class Animal {
    private final String animalAttribute;

    protected Animal(Builder b) {
        this.animalAttribute = b.animalAttribute;
    }

    public String animalAttribute() {
        return this.animalAttribute;
    }

    public static abstract class Builder {
        private String animalAttribute;

        public Builder color(String color) {
            this.animalAttribute = color;
            return this;
        }
    }
}

I personally like this one as it seems more similar to Jackson object mappers annotations

pjcahill commented 4 years ago

I second trying to make it close to the Jackson Method since it has been proven out.

We use it and have had no issues with the flexibility. We have used it to marshal to classes based on string and enum properties which would be nice for the DynamoDb mapper as well.

For Context a snippet from the Jackson Example from https://www.tutorialspoint.com/jackson_annotations/jackson_annotations_jsonsubtypes.htm

@JsonTypeInfo(use = JsonTypeInfo.Id.NAME, 
      include = As.PROPERTY, property = "type") @JsonSubTypes({

      @JsonSubTypes.Type(value = Square.class, name = "square"),
      @JsonSubTypes.Type(value = Circle.class, name = "circle")
   })
{
   "type" : "circle",
   "name" : "CustomCircle",
   "radius" : 1.0
}
mlhrAnjaria commented 3 years ago

Is this implemented in dynamodb-enhanced? If yes, any references?

piechos92 commented 3 years ago

Is this implemented already? That would extremely useful

rohit-gandhe commented 3 years ago

Bump!!

millems commented 3 years ago

Sorry, this is not implemented yet.

brunograna commented 3 years ago

Bump!!

bmaizels commented 3 years ago

I've been working on this - it's a complex change. Would any interested parties mind sharing their use-case (specifically describing how polymorphic mapping would be used to help solve it versus just having multiple typed DynamoDbTable interfaces) just so I make sure I'm hitting the target? My functional tests are very contrived.

musketyr commented 3 years ago

Hi @bmaizels. Thanks for working on this! We would like to implement Single-Table Design. We would like to display a calendar with a multiple heterogenous items. You can imagine something like Google Calendar where you have items that have very little in common - basically just an owner and start time - calendar events, tasks, reminders, notes. It would be very inconvenient to have an entity which would contain all fields from all the possible subtypes.

bmaizels commented 3 years ago

@musketyr (hey long time no speak!) yeah I think I get it, and that's pretty much the use-case I had in mind. Something like 'get me all the connected entities to X or within range Y' where those entities may have different schema. I can't think of a way to do that without polymorphic mapping other than by having a super-object that is the union of all the entity schemata. Sounds like I'm on the right track then, at least for your use-case, thanks. If anyone has any others they'd like to share that might influence the solution please go ahead.

DavidSeptimus commented 3 years ago

I had a similar use-case that involved storing various types of user activities in the same table where each activity type had its own unique set of attributes.

I spent some time working on a rough implementation of a polymorphic TableSchema implementation using the jackson pattern awhile back, but haven't had time to clean it up and put together a full test suite (there are a couple of converted Crud tests).

Here it is for reference: https://github.com/DavidSeptimus/aws-sdk-java-v2/commit/9002c181e6b415bbaa821b92041a0a4276cca48a

bmaizels commented 3 years ago

@DavidSeptimus That's awesome, thanks so much for sharing. Even just glancing at it I can tell we're on the same wavelength with regards around how to solve it (I also created a new TableSchema type called PolymorphicTableSchema) with very similar attributes. The main differences between your solution and mine is really the interface dressing on top. One key difference is that for this library we always want to solve it for the Static use-case first, and then build the annotated version on top of it, then the extra challenge of smoothly integrating it into the existing DX patterns to make it effortless to use. I'll definitely be taking a close look at this and see if there's anything I can learn from it to accelerate/improve the final integrated solution. Thank you again for sharing your work!

miguelcss commented 3 years ago

Hello @bmaizels, thank you for all the information in this thread. I'm trying to implement single-table design also, with an aggregator entity, say "Department", and a collection of associated sub-elements, say "Team", sharing a similar partitionKey and a different sortKey. Then I want to query for the access pattern "give me a Department and all the teams in it", or give me an entity and a collection of associated sub-entities for a given partitionKey.

So it would be something like:

@DynamoDbBean
@DynamoDbSubtypes({Department.class, Team.class})
public abstract class Org {

    private String orgId;

    @DynamoDbPartitionKey
    @DynamoDbAttribute(value = "partitionKey")
    public String getPartitionKey() {
        return orgId;
    }
}

@DynamoDbBean
public class Department extends Org {

    @DynamoDbSortKey
    @DynamoDbAttribute(value = "sortKey")
    public String getSortKey() {
        return deptId;
    }

    @DynamoDbCollection({Team.class})
    private List<Team>;
    (...)    
}

@DynamoDbBean
private class Team extends Org {

    @DynamoDbSortKey
    @DynamoDbAttribute(value = "sortKey")
    public String getSortKey() {
        return String.format("%s#%s", deptId, teamId);
    }
    (...)
}

In the meantime I was looking at using multiple typed DynamoDbTable interfaces as suggested:

dynamoDbEnhancedClient.table("Company", TableSchema.fromBean(Org.class));
dynamoDbEnhancedClient.table("Company", TableSchema.fromBean(Team.class));

But I'm unsure on how to query for the mentioned access pattern. Is it possible to achieve this with a single query with enhanced client as is today? Would the polymorphic support affect or change that?

bmaizels commented 3 years ago

@miguelcss To accomplish an access pattern such as "give me a Department and all the teams in it" you would need the polymorphic changes I am proposing. With those, I think the most straightforward way would be to have a GSI that ensured that every entity you wanted to return had the value of the related department ID in it and then query that GSI using a polymorphic org TableSchema. The result set would be typed as Org but each instance would actually be a Department or Team. Without the polymorphic changes, you would have to make two queries, one to get the teams and one to get the department itself.

miguelcss commented 3 years ago

@bmaizels - Understood, thank you for the quick reply. I'm looking forward for the added support, let me know if I can help in any way.

bmaizels commented 3 years ago

For anyone tracking this issue, I have created a PR with an implementation that I hope addresses the needs identified here. It will probably be at least a couple of weeks before this gets final approval and makes it into the release, so it would be extremely valuable if people could try it out and see if there are any rough edges or improvements I could make before it becomes difficult to change it after release. It's a complex change, so lots of scope for not getting things quite right. Instructions on how to declare subtypes are in the new README.md file in the dynamodb-enhanced module under the section titled 'Using subtypes....'. https://github.com/aws/aws-sdk-java-v2/pull/2861

schmittjoaopedro commented 2 years ago

I think for this PR it would be good to take inspiration from Hibernate about some inheritance models: https://www.baeldung.com/hibernate-inheritance

JudahBrick commented 2 years ago

Would be really useful if we can get this issue fixed! Made sure to redesign our Dynamo tables to use best practices and single table design, and then ran into this issue.

adrian-skybaker commented 1 year ago

Would any interested parties mind sharing their use-case

See the https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-adjacency-graphs.html#bp-adjacency-lists.

When different entities of an application have a many-to-many relationship between them, the relationship can be modeled as an adjacency list. In this pattern, all top-level entities (synonymous to nodes in the graph model) are represented using the partition key. Any relationships with other entities (edges in a graph) are represented as an item within the partition by setting the value of the sort key to the target entity ID (target node).

The advantages of this pattern include minimal data duplication and simplified query patterns to find all entities (nodes) related to a target entity (having an edge to a target node).

ysfaran commented 2 weeks ago

Being an AWS customer I'm uttlerly confused on how you can promote Single-Table-Design and then just ignore it in the SDK implementation.

This ticket is open since more than 4 years and is long due. The PR to potentially resolve this issue, is also stale for 2 years.

In a lot of places in the docs it is promoted to use the minimum amount of tables, ideally just one. So the only reasonable action here is to support Single-Table-Design in all AWS SDK's.

I hope this ticket will be taken more seriously in favour of your customers, who might hesitate to upgrade SDK versions, because of this issue.