Clear-Bible / ClearDashboard

The ClearDashboard project
Other
1 stars 2 forks source link

Lexicon import #812

Open russellmorley opened 1 year ago

russellmorley commented 1 year ago

1. Story

Users need a way to import lexicon and gloss related information from drafting tools like paratext so they don't need to double enter them into both systems and so auto-glossing results in Dashboard can be made even better, quickly and with less effort.

Because imported data needs to be 'scrubbed' in some cases, users need a user interface to perform this function. To reduce the number of UIs users must learn, increase UI consistency, and reduce development costs, the user interface for this function can, and should, do 'double duty', serving in both the scrub function and the general lexicon view/edit function of Dashboard.

While the existing PINs tool attempts to enable users to view lexicon data, it is flawed -- mixing dissimilar data elements in a common view resulting in duplicated information that is inconsistent, it only provides a view over paratext data and does not otherwise integrate with Dashboard's data, and its codebase is not maintainable.

This design proposes providing these benefits to users in three phases, bringing meaningful value to users as rapidly as possible then building upon it in logical steps, one on 'top' of the other : bring data in from paratext so users don't have to enter it again, provide a way for users to scrub it, then provide a consistent, logical way for users to view and edit the lexicon within Dashboard so they can more clearly understand the target language and at the same time build a body of translation knowledge about it to empower future translation and quality checking work.

2. Dependencies

This feature works in conjunction with [this feature], the latter of which provides users with the ability to break down Tokens in Dashboard to a level that matches the data imported from Paratext.

3. Phases

  1. Import lexeme and translation (gloss) from Paratext's lexicon and biblical terms where the form is not in Dashboard's lexeme or forms tables as exact matches and the translation is not in Dashboard's translation table as an exact match. (Described here).
  2. Add UIs that let the user control imports to detect partial and complete lexeme or form matches and provide them with the functionality to add in the data into existing Dashboard entries. (Described here)
  3. Add UIs to show verse locations and context where lexeme glosses and biblical term renderings were attached, what has been imported already so user can come back to it and finish later, and ability to detect changes in paratext and update related records in Dashboard. (UI designs not in this description).

4. Trigger

Dashboard menu option 'import lexicon from paratext project' triggers LexiconImportViewModel

5. Workflow

  1. Select project UI
  2. Import data into memory
  3. Select and save data into lexicon a. Phase 1: import and save only imported entries that don't match any lexeme, form, or translation b. Phase 2:
    • Import UI: View data and select if/how to integrate into Dashboard lexicon.
    • Lexicon Edit UI: Custom integrate a lexicon entry
  4. Phase 3: Add verse locations (not included yet.

6. 2.Import data into memory

6(a) Interface:


class Lexicon : IEnumerable<Lexeme>
{}

interface ILexiconObtainable
{
     Lexicon GetLexicon();
}

6(a)(i) Plugin CQRS Query

ParatextPlugin

ParatextPlugin.CQRS/Features/Lexicon/

ClearDashboard.DAL.Features/Lexicion/

6(a)(ii) DB CQRS Query

ClearDashboard.DAL.Alignment/Features/Lexicon

6(b) class LexiconFromXmlFiles : ILexiconObtainable implementation

6(b)(i) Paratext lexicon info

provides
- Translation (e.g. ZZ_SUR) -> English
- Translation (e.g. ZZ_SUR)  -> Spanish
- Translation  (e.g. ZZ_SUR)  -> others...etc.
Contained in

Example of data

<item>
    <Lexeme Type="Word" Form="jìi" Homograph="1"/>
    <Entry>
        <Sense Id="MNCz424E">
            <Gloss Language="English">bring</Gloss>
        </Sense>
        <Sense Id="JVvAUZFY">
            <Gloss Language="en">bring</Gloss>
        </Sense>
    </Entry>
</item>

Sense Id is used as gloss id in interlinear files. Believe sense can be viewed as a 'meaning', in our terminology, with each gloss under it corresponding to a translation. In the case of paratext it appears there is only one translation per sense, language pair, which is different than Dashboard's which supports more than one translation per meaining (i.e. synonyms).

6(b)(ii) Paratext BiblicalTerms info

provides
- Hebrew/Greek -> English (gloss element)
- Hebrew/Greek -> Translation (e.g. ZZ_SUR)  (renderings link)
Contained in
  1. \ProjectBiblicalTerms.xml (? - Milt doesn't use)
  2. \Terms\Lists\BiblicalTerms.xml
  3. \Terms\Lists\AllBiblicalTerms.xml

Looks like data is extracted from 1->3, where terms from lower number take precedence over higher numbers, e.g. for a term in both 2 and 3 the entry in 2 is used.

Hebrew/Greek -> English extracted from <gloss> element, when present.

Renderings

Biblical terms <Term Id="Δέρβη"> id links to termsrederings <TermRendering Id="Δέρβη" Guess="false"> id.

Hebrew/Greek -> Translation extracted from <Renderings> element when <TermRendering Id="Δέρβη" Guess="false"> guess=false.

7. 3(a)Select and save data into lexicon: Phase 1: import and save only imported entries that don't match any lexeme, form, or translation

7(a) Manager

7(a)(i) New Methods:

7(b) LexiconImportViewModel

7(c) Data model enhancements

7(c)(i) New field on Translation entity json OriginatedFrom (or can be xml) that contains:

{
    App: Paratext,
    Module: Lexicon,
    LexemeType: "Phrase",
    Form: "mbii ɗi moo seen mo",
    Homograph: "1",
    SenseId: "gr8mf78f">       <- one translation per sense.
}

Translation comes from 'gloss' field for english, TermRendering/Renderings (may be more than one delimited in some way) for target language

{
    App: Paratext,
    Module: BiblicalTerms
    Term Id: "אֶשֶׁל",
    Strong: H0815,
    Language: hebrew,
    Definition: small, fast-growing tree, about 10 meters high, found abundantly in deserts, dunes, and salt marshes; leafless, has green branches, and a wide crown; has small white flowers, and its fruit is a capsule with feathery seeds; Tamarix aphylla; durable wood; could have a link with cultic worship,
    References:
    [
        {
            LocationType: Verse,
            Location: 00102103300006
        }
        {
            LocationType: Verse,
            Location: 00902200600038
        }
    ]
}

7(d)(i) New field for lexeme/form type

7(e) implementation

Menu trigger LexiconImportViewModel, which then

  1. uses ParatextPlugin.CQRS.Features.Lexicon.GetLexiconQuery to obtain lexicon entries from paratext.
  2. uses LexiconManager.GetExternalLexiconNotInInternal() to find paratext lexicon entries that don't match entries in Dashboard's lexicon.
  3. saves them using LexiconMnager.Save().

8. Select and save data into lexicon, Phase 2

8(a) Import UI

Mockup

[Button: Import checked (A)]

[] select all SourceWord v SourceLanguage v TargetWord v TargetLanguage v
[] ooga mwagavul tree english [Button: add as form for... (B)]
[] booga mwagavul gear english [Button:add target as translation for ... (C)]
[] פְּעֻלְּתַי hebrew pole english [Button: add as form for... (B)]
[] ἔνταλμα greek bobobugga mwagavul [Button: add as form for... (B)]

8(b) LexiconEditUI

Parameters:

sourceLanguage=null targetLanguage=null string toMatch Mode PartialMatchOnLexemeOrForm | MatchOnTranslation | Edit string other

Mockup

[Drop down: Source Language drop down v (A)] [Drop down: Target Language drop down v (B)]

Find all [checkbox: lexeme (C)] [select: partially|fully (D)] [or (E)] [checkbox: forms (F) ] [select: partially|fully (G)] [matching (H)] [textbox:(I)] [select: and | or (J)] [checkbox: translation (K)] fully matching [textbox: (L)]

Lexeme v Type v Forms Meanings [translations]
[ action (M)] [Edit] ooga Stem oogas, looga A perennial woody plant [tree, conifer, sapling, timber], Something that resembles a tree in form [treecontrol, hierarchy]
[ action (M)] [Edit] booga Word boogga bboooga To occupy oneself in an activity for amusement or recreation.[play, recreate], To participate in betting; gamble [gamble]

[Save changes (N)]

russellmorley commented 1 year ago

Hi @gerfen

question: Do the "v" characters represent the sorting direction on the columns?

Image

Answer: Yes, they are sorted columns and the 'v' represents sort direction.

question: Is "add as form for..." literal?

answer: yes it is literal, with nothing else. It means add this source in the UIImport as the form for (which when clicked, goes to the LexiconEditUI) this (the picked) lexeme.

morleycb commented 1 year ago

@romanpoz First phase up through '8(a) Import UI' has been merged into DEV branch. Development continues with '8(b) LexiconEditUI', which was a stretch goal for release 1.2.