NerdWalletOSS / dynamorm

Python object & relation mapping library for Amazon's DynamoDB service
https://nerdwalletoss.github.io/dynamorm/
Other
101 stars 38 forks source link

ManyToMany #89

Open ricky-sb opened 5 years ago

ricky-sb commented 5 years ago

I would like to open discussions about implementing a ManyToMany feature.

From here:

Supporting a Many-to-Many relationship through an intermediate join table will be added in a future PR once we have a better use-case to develop against

I can provide this use-case if you'd like.

Although Amazon recommends an Adjacency List Pattern, I dislike how it makes the database almost completely unreadable.

Here's an interesting post which goes over potential approaches to ManyToMany in DynamoDB.

Can we discuss these approaches?

Using auxiliary tables:

A potential weakness of this approach is that it requires 3 tables

And the fact that a lookup requires multiple requests over 3 tables... The fewer requests we can make to DynamoDB, the better it would be for performance.

Using sets:

The major downside to this approach is the potential for data quality issues. If you link a Patient to a Doctor, you have you co-ordinate two updates, one to each table. What happens if one update fails? Your data can get out of sync.

Using transactions, we can ensure that updates are coordinated, and since we're doing this inside an ORM, the end-user doesn't need to worry about managing these calls directly. The real downside to this is the storage capacity on the attribute (400 KB). If our lists exceeds 400 KB, we can't add any more relationships.

Using adjacency lists:

This seems almost incompatible with current semantics. To properly spec out an adjacency list, you need to know your access patterns up front. There's no "pythonic" way to declare access patterns, so we'd need to build an entirely library just for that, and then convert that into an adjacency list.

Note: These guys have attempted it here

Quidge commented 4 years ago

(This is offtopic, please remove if considered too derailing.)

Question: why use dynamo for relational data? The ridiculous read/write +scalability of dynamorm is somewhat moot if the architecture doesn't embrace uni-table design.

Note: I do know this is painful to read because I'm dealing with it in my own projects.

ricky-sb commented 4 years ago

Uni-table design is difficult for me. I'd love for there to be a Pythonic way to manage single-table design.

Maybe something like:

And then, DynamORM automatically generates and manages compound keys, index overloading, adjacency lists, etc.

BTW, AWS's own AppSync uses multi-table design for DynamoDB relationships, even those guys have trouble automating single table design.

Quidge commented 4 years ago

Those are definitely the sort of things that I'd expect an ORM to be responsible for. But it still feels like an ask to contort dynamo into something that it doesn't want to be.

Do you have a link to the AppSync architecture? I'd like to read more about their solution.

ricky-sb commented 4 years ago

Check out this issue: https://github.com/aws-amplify/amplify-cli/issues/91

They use an intermediate join table. Scroll to the end and you'll see what I mean.

Some more relevant docs: https://aws-amplify.github.io/docs/cli-toolchain/graphql#connection https://aws-amplify.github.io/docs/cli-toolchain/graphql#data-access-patterns

Quidge commented 4 years ago

Whoah. It looks like Amplify autogenerates a DynamoDB table for you based on a GQL schema? I've an auto table generator before for relational with Prisma but going through a NoSQL direction is new to me. Thanks for the links!