Open heyheyjp opened 7 years ago
It makes querying complicated and highly inefficient.
I think the solution for this is just an index, which would make querying projects by player both simple and efficient without making the query that gets all playerIds for a project more complex and less efficient by introducing a join.
Strongly disagree. I think that as a rule, we should absolutely never embed references (IDs) to objects in an existing table as items in an array in an object in another table. There might be very appropriate use cases for an array type model property, but this is not one of those cases. We should be using join tables.
without making the query that gets all playerIds for a project more complex and less efficient by introducing a join
We have a thinky-based object model now that makes retrieving associated objects very easy, so this is not true.
I guess I also strongly disagree. Even if thinky abstracts the forming of the join queries, it will still be less efficient. And the data model still will be more complex under the hood which could still hurt us if we ever need to write queries without Thinky's help.
I think that as a rule, we should absolutely never embed references (IDs) to objects in an existing table as items in an array in an object in another table
Can you explain why you see this as inherently bad.
How will it be less efficient? Please explain.
Also, please see rethinkdb's own recommendations for modeling many-to-many relationships:
https://rethinkdb.com/docs/table-joins/#many-to-many-relations
How will it be less efficient? Please explain.
In order to get player details for all players on a project I have to consult 3 different tables with the join table, and only 2 if the ids are embedded.
Talking about performance in a vacuum is non-productive. And, from my experience with databases, joins are only something you worry about when there are a very large number of rows (specifically, when the indexes for all tables cannot fit into the memory of the DB host).
Perhaps we should have a meeting and discuss this real-time rather than asynchronously? Perhaps next week?
I'm looking at more of the "literature" and it does seems that the join table is the more idiomatic way to do this according to rethinkdb folks, stackoverflow, etc. I still haven't seen any reasoning as to why, but maybe "this is what other people do and it's what Thinky supports out of the box" is a good enough reason.
I'm not sure a meeting is warranted at this point since this is literally the last thing in the backlog right now. If this gets moved up and we're going to spend time changing this part of the data model at the expense of other stuff then I could probably use a little more convincing that the payoff is worth it. So let's wait and schedule a meeting when and if this gets closer to the top?
Appreciate you both, @bundacia & @jeffreywescott.
It probably will move up at some point ... honestly, when someone makes a bid for it along the lines of: "it would make way more sense to tackle #530 now because I'm in that code anyway and it will make things cleaner, etc.".
We need to undo the embedding of player id references in the project document. It makes querying complicated and highly inefficient. While it's easy to find the players on a project, it's too inefficient to pull projects by player. Creating a simple join table where each document is a mapping of player to project, it makes querying and project team management easier. It'll also set us up for easy expansion of this as needed for player-project settings in the future.