Lexicon import - Githubissues

1. Story

Users need a way to import lexicon and gloss related information from drafting tools like paratext so they don't need to double enter them into both systems and so auto-glossing results in Dashboard can be made even better, quickly and with less effort.

Because imported data needs to be 'scrubbed' in some cases, users need a user interface to perform this function. To reduce the number of UIs users must learn, increase UI consistency, and reduce development costs, the user interface for this function can, and should, do 'double duty', serving in both the scrub function and the general lexicon view/edit function of Dashboard.

While the existing PINs tool attempts to enable users to view lexicon data, it is flawed -- mixing dissimilar data elements in a common view resulting in duplicated information that is inconsistent, it only provides a view over paratext data and does not otherwise integrate with Dashboard's data, and its codebase is not maintainable.

This design proposes providing these benefits to users in three phases, bringing meaningful value to users as rapidly as possible then building upon it in logical steps, one on 'top' of the other : bring data in from paratext so users don't have to enter it again, provide a way for users to scrub it, then provide a consistent, logical way for users to view and edit the lexicon within Dashboard so they can more clearly understand the target language and at the same time build a body of translation knowledge about it to empower future translation and quality checking work.

2. Dependencies

This feature works in conjunction with [this feature], the latter of which provides users with the ability to break down Tokens in Dashboard to a level that matches the data imported from Paratext.

3. Phases

Import lexeme and translation (gloss) from Paratext's lexicon and biblical terms where the form is not in Dashboard's lexeme or forms tables as exact matches and the translation is not in Dashboard's translation table as an exact match. (Described here).
Add UIs that let the user control imports to detect partial and complete lexeme or form matches and provide them with the functionality to add in the data into existing Dashboard entries. (Described here)
Add UIs to show verse locations and context where lexeme glosses and biblical term renderings were attached, what has been imported already so user can come back to it and finish later, and ability to detect changes in paratext and update related records in Dashboard. (UI designs not in this description).

4. Trigger

Dashboard menu option 'import lexicon from paratext project' triggers LexiconImportViewModel

5. Workflow

Select project UI
Import data into memory
Select and save data into lexicon a. Phase 1: import and save only imported entries that don't match any lexeme, form, or translation b. Phase 2:
- Import UI: View data and select if/how to integrate into Dashboard lexicon.
- Lexicon Edit UI: Custom integrate a lexicon entry
Phase 3: Add verse locations (not included yet.

6. 2.Import data into memory

6(a) Interface:


class Lexicon : IEnumerable<Lexeme>
{}

interface ILexiconObtainable
{
     Lexicon GetLexicon();
}

6(a)(i) Plugin CQRS Query

ParatextPlugin

GetLexiconQueryController(signalr controller)
GetLexiconQueryHandler : ParatextRequestHandler<GetLexiconQuery, RequestResult<Lexicon>, Lexicon> Implementation uses instance of LexiconFromXmlFiles : ILexiconObtainable

ParatextPlugin.CQRS/Features/Lexicon/

GetLexiconQuery : IRequest<RequestResult<Lexicon>>

ClearDashboard.DAL.Features/Lexicion/

GetLexiconQueryHandler : ParatextRequestHandler<GetLexiconQuery, RequestResult<Lexicon> ,Lexicon>
GetLexiconQuery : IRequest<RequestResult<Lexicon>>

6(a)(ii) DB CQRS Query

ClearDashboard.DAL.Alignment/Features/Lexicon

GetLexiconQueryHandler : ProjectDbContextCommandHandler<GetLexiconQuery, RequestResult<Lexicon>,Lexicon>
GetLexiconQuery : IRequest<RequestResult<Lexicon>>

6(b) class `LexiconFromXmlFiles : ILexiconObtainable` implementation

6(b)(i) Paratext lexicon info

provides

- Translation (e.g. ZZ_SUR) -> English
- Translation (e.g. ZZ_SUR)  -> Spanish
- Translation  (e.g. ZZ_SUR)  -> others...etc.

Contained in

/lexicon.xml

Example of data

<item>
    <Lexeme Type="Word" Form="jìi" Homograph="1"/>
    <Entry>
        <Sense Id="MNCz424E">
            <Gloss Language="English">bring</Gloss>
        </Sense>
        <Sense Id="JVvAUZFY">
            <Gloss Language="en">bring</Gloss>
        </Sense>
    </Entry>
</item>

Sense Id is used as gloss id in interlinear files. Believe sense can be viewed as a 'meaning', in our terminology, with each gloss under it corresponding to a translation. In the case of paratext it appears there is only one translation per sense, language pair, which is different than Dashboard's which supports more than one translation per meaining (i.e. synonyms).

6(b)(ii) Paratext BiblicalTerms info

provides

- Hebrew/Greek -> English (gloss element)
- Hebrew/Greek -> Translation (e.g. ZZ_SUR)  (renderings link)

Contained in

\ProjectBiblicalTerms.xml (? - Milt doesn't use)
\Terms\Lists\BiblicalTerms.xml
\Terms\Lists\AllBiblicalTerms.xml

Looks like data is extracted from 1->3, where terms from lower number take precedence over higher numbers, e.g. for a term in both 2 and 3 the entry in 2 is used.

Hebrew/Greek -> English extracted from <gloss> element, when present.

Renderings

/termrenderings.xml

Biblical terms <Term Id="Δέρβη"> id links to termsrederings <TermRendering Id="Δέρβη" Guess="false"> id.

Hebrew/Greek -> Translation extracted from <Renderings> element when <TermRendering Id="Δέρβη" Guess="false"> guess=false.

7. 3(a)Select and save data into lexicon: Phase 1: import and save only imported entries that don't match any lexeme, form, or translation

7(a) Manager

Add new methods to `abstractions/services/LexiconManager
abstractions/services/LexiconManager uses ClearDashboard.ParatextPlugin.CQRS.Features.GetLexiconQuery to obtain the lexicon from paratext and ClearDashboard.DAL.Alignment.Features.Lexicon.GetLexiconQuery to obtain lexicon from the database. Would be nice to remove the ParatextPlugin hardcoding from this so this can be extended to other drafting tools without having to change this code

7(a)(i) New Methods:

public Lexicon GetExternalLexiconNotInInternal(Lexicon externalLexicion, Lexicon internalLexicon) returns externalLexicon.WHERE(el => InternalLexicon..Count(il => il.lexeme == el.lexeme) == 0 && InternalLexicon..Count(il => il.lexeme in el.forms) == 0 && InternalLexicon..Count(il => il.translations.Intersect(el.translations).Count() > 0) == 0)
public void Save(Lexicon lexicon)

7(b) LexiconImportViewModel

private Lexicon GetExternalLexiconNotInInternal(Lexicon externalLexicion, Lexicon internalLexicon) returns externalLexicon.WHERE(el => InternalLexicon..Count(il => il.lexeme == el.lexeme) == 0 && InternalLexicon..Count(il => il.lexeme in el.forms) == 0 && InternalLexicon..Count(il => il.translations in el.translations) == 0)

7(c) Data model enhancements

7(c)(i) New field on `Translation` entity `json OriginatedFrom` (or can be xml) that contains:

{
    App: Paratext,
    Module: Lexicon,
    LexemeType: "Phrase",
    Form: "mbii ɗi moo seen mo",
    Homograph: "1",
    SenseId: "gr8mf78f">       <- one translation per sense.
}

Translation comes from 'gloss' field for english, TermRendering/Renderings (may be more than one delimited in some way) for target language

{
    App: Paratext,
    Module: BiblicalTerms
    Term Id: "אֶשֶׁל",
    Strong: H0815,
    Language: hebrew,
    Definition: small, fast-growing tree, about 10 meters high, found abundantly in deserts, dunes, and salt marshes; leafless, has green branches, and a wide crown; has small white flowers, and its fruit is a capsule with feathery seeds; Tamarix aphylla; durable wood; could have a link with cultic worship,
    References:
    [
        {
            LocationType: Verse,
            Location: 00102103300006
        }
        {
            LocationType: Verse,
            Location: 00902200600038
        }
    ]
}

7(d)(i) New field for lexeme/form `type`

7(e) implementation

Menu trigger LexiconImportViewModel, which then

uses ParatextPlugin.CQRS.Features.Lexicon.GetLexiconQuery to obtain lexicon entries from paratext.
uses LexiconManager.GetExternalLexiconNotInInternal() to find paratext lexicon entries that don't match entries in Dashboard's lexicon.
saves them using LexiconMnager.Save().

8. Select and save data into lexicon, Phase 2

8(a) Import UI

Mockup

[Button: Import checked (A)]

[] select all	SourceWord v	SourceLanguage v	TargetWord v	TargetLanguage v
[]	ooga	mwagavul	tree	english	[Button: add as form for... (B)]
[]	booga	mwagavul	gear	english		[Button:add target as translation for ... (C)]
[]	פְּעֻלְּתַי	hebrew	pole	english	[Button: add as form for... (B)]
[]	ἔνταλμα	greek	bobobugga	mwagavul	[Button: add as form for... (B)]

Items in LexiconManager.GetExternalLexiconNotInInternal() pre-checked
(A) button uses LexiconMnager.Save() to save entries checked then dismisses dialog.
(B) tooltip "find lexemes that have one or more translation that matches target"
(B) button shows for rows externalLexicon.WHERE(el => InternalLexicon.Count(il => il.translations.Intersect(el.translations).Count() > 0) > 0)
When clicked, (B) goes to LexiconEditUI((sourceLanguage, targetLanguage, targetWord, MatchOnTranslation, sourceWord)
(C) button shows for rows externalLexicon.WHERE(el => internalLexicon.Count(il => il.lexeme.Contains(el.lexeme) > -1 || il.forms.Contains(el.lexeme)) > 0
(C) tooltip "find full or partial lexeme or form matches for source"
When clicked (C) goes to LexiconEditUI((sourceLanguage, targetLanguage, sourceWord, PartialMatchOnLexemeOrForm, targetWord)

8(b) LexiconEditUI

Parameters:

sourceLanguage=null targetLanguage=null string toMatch Mode PartialMatchOnLexemeOrForm | MatchOnTranslation | Edit string other

Mockup

[Drop down: Source Language drop down v (A)] [Drop down: Target Language drop down v (B)]

Find all [checkbox: lexeme (C)] [select: partially|fully (D)] [or (E)] [checkbox: forms (F) ] [select: partially|fully (G)] [matching (H)] [textbox:(I)] [select: and | or (J)] [checkbox: translation (K)] fully matching [textbox: (L)]

	Lexeme v	Type v	Forms	Meanings [translations]
[ action (M)] [Edit]	ooga	Stem	oogas, looga	A perennial woody plant [tree, conifer, sapling, timber], Something that resembles a tree in form [treecontrol, hierarchy]
[ action (M)] [Edit]	booga	Word	boogga bboooga	To occupy oneself in an activity for amusement or recreation.[play, recreate], To participate in betting; gamble [gamble]

[Save changes (N)]

(A) set to sourceLangage param and make read only if not null
(B) set to targetLanguage param and make read only if not null
(D) visible when (C) checked; (G) visible when (F) checked; (E) visible when (C) and (F) checked; (H) and (I) visible when (C) or (F) checked; (J) visible when (K) and ((C) or (F) checked); (L) visible when (K) checked.
when [PartialMatchOnLexemeOrForm] parameter: (C) checked, (D) set to partially, (F) checked, (G) partial, (I) filled in with [toMatch] parameter, (K) unchecked; (M) is "Edit adding [other] as translation to first meaning" and when pressed (M) changes line into in-place editing, adds [] to first default meaning if no meaning, then adds [other] to first meaning's comma delimited [] list and selects other in this list so user can see what has been added.
when [MatchOnTranslation] parameter: (C) and (F) unchecked, (K) checked; (L) set to [toMatch].; (M) is "Edit adding [other] as form" and when pressed (M) changes line into in-place editing, adds [other] to the comma delimited list of forms, and highlights added form so user can see what was added..
When [Edit] parameter, M is hidden.
Pressing [Edit] changes line to in-place editing.
Pressing (N) forms Lexicon from edited lines and calls LexiconManager.Save(Lexicon lexicon)

Clear-Bible / ClearDashboard

Lexicon import #812

1. Story

2. Dependencies

3. Phases

4. Trigger

5. Workflow

6. 2.Import data into memory

6(a) Interface:

6(a)(i) Plugin CQRS Query

6(a)(ii) DB CQRS Query

6(b) class `LexiconFromXmlFiles : ILexiconObtainable` implementation

6(b)(i) Paratext lexicon info

provides

Contained in

6(b)(ii) Paratext BiblicalTerms info

provides

Contained in

Renderings

7. 3(a)Select and save data into lexicon: Phase 1: import and save only imported entries that don't match any lexeme, form, or translation

7(a) Manager

7(a)(i) New Methods:

7(b) LexiconImportViewModel

7(c) Data model enhancements

7(c)(i) New field on `Translation` entity `json OriginatedFrom` (or can be xml) that contains:

7(d)(i) New field for lexeme/form `type`

7(e) implementation

8. Select and save data into lexicon, Phase 2

8(a) Import UI

Mockup

8(b) LexiconEditUI

Parameters:

Mockup

Clear-Bible / ClearDashboard

Lexicon import #812

1. Story

2. Dependencies

3. Phases

4. Trigger

5. Workflow

6. 2.Import data into memory

6(a) Interface:

6(a)(i) Plugin CQRS Query

6(a)(ii) DB CQRS Query

6(b) class LexiconFromXmlFiles : ILexiconObtainable implementation

6(b)(i) Paratext lexicon info

provides

Contained in

6(b)(ii) Paratext BiblicalTerms info

provides

Contained in

Renderings

7. 3(a)Select and save data into lexicon: Phase 1: import and save only imported entries that don't match any lexeme, form, or translation

7(a) Manager

7(a)(i) New Methods:

7(b) LexiconImportViewModel

7(c) Data model enhancements

7(c)(i) New field on Translation entity json OriginatedFrom (or can be xml) that contains:

7(d)(i) New field for lexeme/form type

7(e) implementation

8. Select and save data into lexicon, Phase 2

8(a) Import UI

Mockup

8(b) LexiconEditUI

Parameters:

Mockup

6(b) class `LexiconFromXmlFiles : ILexiconObtainable` implementation

7(c)(i) New field on `Translation` entity `json OriginatedFrom` (or can be xml) that contains:

7(d)(i) New field for lexeme/form `type`