EzerIT / BibleOL

Web-based instruction in Biblical Hebrew and Greek
Other
25 stars 16 forks source link

Pipeline for adding new grammar features to existing Emdros Databases #66

Open oliverglanz opened 3 months ago

oliverglanz commented 3 months ago

As requested by @ernstboogert I put down a first draft on who a potential pipeline for adding new grammar features could look like. The pipeline description is for sure lacking several crucial elements, that I am not aware of. Please fill in where things are missing. I will use the ETCBC v4c database as used in BibleOL as a sample:

  1. General condition: The new Grammar features needs to be developed on the basis of the existing BibleOL emdros database. This new feature can be a feature that is part of the lowest grit (presently: monad=word) of the database or the highest grit (book).

=> Since some of the new features will likely happen on a level that is lower than the monad level (e.g. Dagesh Lene, Dagesh Forte, Vocal vs Silent Shewa, etc.) one needs to discuss whether signs (consonants, vowels, punctation) can be introduced as a new sub-monad level with its own sequencing. E.g B(1).(2):(3)R(4);(5)>(6)C(7) etc. (for בְּרֵאשִׁ֖ית). At that stage one can add features for signs. For example sign (2)=Dagesh Lene. Since the lowest grit level at this stage is the word level (monad level), I have created DL/DF information as part of the word feature. For example:

Screenshot 2024-04-05 at 00 36 49

You can see that I have added information about the daghes in first consonant of a word/monad. If this feature would be implemented in BibleOL one could create an exercise where words are presented to students and they need to identify whether the Dagesh in a word is a DLBegin or a DFBegin. In cases where you have several Daghes is a word the option could be DLBeginDFMiddle. For example:

image
  1. The new features can be developed with any tool. Most likely TextFabric will work the best. What is important is that the features developed are consistent and are ideally reproducable (rule based). For example for identifying Dagesh Lene and Dagesh Forte one could use a definition like:

    image
  2. Once the new features have been developed they can be exported as tsv/csv files where monad_number, phrase_number, phrase_atom_number, etc. are being matched with the new features. For example, when I created the verbal class feature, the new TF version of the BHSa 4c looked like this:

    image

    For each "bol_monad_num1" there was a matching "bol_dict_vc1" entry:

    image
  3. The new feature set with corresponding sequencing number (if morphology the corresponding monad number has to be supplied; if phrase features are developed the corresponding phrase sequence number needs to be supplied), needs to be shared with the BibleOL developer team.

  4. Now the Emdros Database needs to be extended with the new feature.

  5. After the Emdros Database has been updated, the new feature name needs to be made available for different parts of the BibleOL webserver. As an example we use the verbal class feature that has been developed some time ago.

    • The text-display needs to be enriched with the new features:
      image
  • the exercise creation needs to show the new features:

    Screenshot 2024-04-05 at 01 04 48
  • the viewer for taking exercises needs to be updated as well:

    Screenshot 2024-04-05 at 01 09 47

And I am sure there are more elements of the server that need updating to make sure the newly developed features is available for sue.

  1. Robust testing will be the last part of the process.