Closed arrocke closed 6 months ago
Just to make sure I'm following...
The gloss phrases will consists of different combinations of glosses, depending on the language? In one language this sequence of three original language words will have three target language glosses, but in another language that sequence of three words might be one phrase with one one gloss?
That sounds tricky to implement in the UI, but I'm guessing you're working on the design for that?
This is also to clarify my understanding, I'm not critiquing your diagram, just trying to make sure I understand it clearly.
It's been a couple years since I worked with database diagrams... so I might be confused on a few of these. I'm using this chart for reference.
My relationships in the diagram were backwards. This is fixed now. Sorry for the confusion. The goal is that a gloss in one language can be connected to multiple words, and those same words can be connected to multiple glosses in a different language.
A many to many relationship is required because a phrase consists of one or more words (many), and each word is going to be associated with glosses in one or more languages (also many).
One thing that might clear things up is if we add a Phrase table, and renamed the GlossPhraseWord table to just PhraseWord. That would communicate that we are building phrases from words in specific languages, and a phrase can have a gloss, footnotes, etc. When you query for glosses or notes for the language, you would first query for the phrases in that languages. That would look like this:
erDiagram
Language ||--o{ Phrase : has
Word }o--|{ PhraseWord : connects
PhraseWord }|--|| Phrase : connects
Phrase ||--o| Gloss : has
Phrase ||--o| Footnote : has
Phrase ||--o| TranslatorNote : has
Now that I'm looking at this diagram, I think it is much clearer what is going on. @Pertempto Let me know what you think.
Now that I'm looking at this diagram, I think it is much clearer what is going on
I agree 💯. I think having this Phrase
table to join it all together is the way to go.
One thing I just thought of... how will this all join to the Verse
table? Through the Word
table as before? So to get all the glosses for a verse, we will need to do joins through Word
, PhraseWord
, and Phrase
? That sounds reasonable as far as this chart goes, but will that cause performance issues?
You don't actually need to join the Verse table, you can filter on words that have the verse id you are looking for. We rarely need to join the verse table because we reference everything by the verse id in this way.
I don't think performance will be an issue. It also shouldn't increase too much when we scale because there is a fixed limit for the number of rows connected to each language. If we index the gloss table on language id, then we will never exceed about 600,000 rows per table in these joins. Also, we can start with a normalized database and then optimize later if necessary, but there isn't a good reason to incur the development complexity of denormalizing if we don't need to
We should take this in a few steps to follow Prisma's recommended migration plan.
Some languages require the ability to set a gloss on a group of words, what we will call a phrase. So we need to adapt our model to support multiple words on the same gloss.
The new db structure will be something like the diagram below.
Unfortunately, this means that all glosses will require an extra join. The Gloss table will need its own integer primary key, so that GlossPhrase can be a join table with a primary key that consists of a foreign key to the Word table and a foreign key to the Gloss table. Now we can associate multiple words with a single gloss.
We will also need to move footnotes and translator notes to point to the gloss table rather than the word. This will allow a phrasal gloss to share these notes regardless of which word is selected. The GlossHistory table will also need to point to the gloss table as a foreign key. We'll need to start recording the word ids that comprise the gloss. Since this is just historical data, we could use an array field to record the list of word for the gloss, that way we can reconstruct the history of which gloss phrase each word is part of.
The goal of this ticket is to introduce this structure into the database while preserving the current logic. We will add the API calls and UI elements for phrasal glosses in a different ticket.