Validation interface mockups

aarppe commented 3 years ago

Here are some sketches about how the validation interface could work and look like, using the transcription of the "new" Cree word ewaninstohtahk 'he misunderstands it' as an example case:

Presenting the unvalidated word ewaninstohtahk with the translation 'he misunderstands it'. The spell-checker is used to provide ranked suggestions. For each of the suggestions, one can opt to see their linguistic analysis, lemma, and the English translation(s) of the lemma (and preverbs as well). None of the suggestions are relevant, so we want to allow the validator to edit the transcription.

Having added a hyphen to the transcription --> e-waninstohtahk, the spell-checker is used to provide ranked suggestions again, for which one can opt to see their linguistic analysis, lemma, and the English translation(s) of the lemma.

Now, we think that the third suggestion ê-wani-nistohtâhk might be right, but the translation of its lemma nistohtâw, 'He divides it into three. [MD] s/he divides s.t. in three [CW]', does not match with the original English translation 'he misunderstands it' received during the recording sessions.

Next, we add one more -i- to the transcription --> e-waninistohtahk, for which the spell-checker provides ranked suggestions. This time, the second suggestion ê-wani-nisitohtahk seems promising, as its linguistic analysis, and the translations of the lemma nisitohtam and the preverb wani- match with the original English translation.

Satisfied with this standardization ê-wani-nisitohtahk, the transcription for this particular recording is classified as standardized, wit the date and time and the name/initials of the validator.

The validation of the English translation still remains to be done (by a speaker/Elder).

Further notes:

We need to allow for some word-forms that cannot be analyzed by the current computational model, primarily because the lemma/stem is not in the model or secondarily because the (rarer) inflected form has not been implemented in the model.
In such a case, we have to allow for the manual entering of all the fields: a) the word-form, b) the linguistic analysis, and c) the lemma.
Therefore, besides the automated provision of suggestions for the standardized word-form, linguistic analysis, and lemma, any one of these feels should be manually editable. However, such cases should be flagged with some special code for later scrutiny.
For the status of the validation, there can be several values: a. unvalidated b. standardized (orthography) i. with model ii. manually without model (addition of lemma to model needed) iii. second option needed c. validated (translation) (speaker/Elder use).
Instead of the spell-checker weights, we should use a modified edit-distance metric to rank the suggestions. a. adding/removing diacritics or hyphens, swapping glides w/y, or adding/removing aspirations -h- between vowel and consonant would have a weight of zero. b. inserting the vowel -i- between two consonants would have a half-weight (0.5). c. inserting/removing/swapping any other characters would have a normal weight of one.
Instead of the spell-checker, we could use a weighted descriptive analyzer.

aarppe commented 3 years ago

@nienna73 This issue has my first attempt at describing how the validation task (by a linguist) could work on the word-level.

nienna73 commented 3 years ago

@aarppe This outline and flow of events seems to make sense to me. I've drafted some technical requirements based off our conversation and the description above:

The user must be able to log in using a username (and password?) a. Alternatively, the user must enter their name or initials before changes are saved
The user should be able to search for a term
The system should present the following for each search term: a. The Maskwacîs word b. The audio recording c. Transcription from the field d. Suggested terms from spellchecker e. MED as calculated by service f. Linguistic analysis and lemma from HFST g. Translation of each suggestion as provided by itwewina
The user should be able to manually make edits for the current version of the Cree spelling
The system should be able to store all past versions of Cree words and the user who made each change
The user should be able to accept suggested spellings as the new standard form
The system should be able to flag entries as: unvalidated, standardized with model, standardized without model, standardized pending review, or validated
The system should be able to calculate MED based on requirements above

I can create a new issue for each of these requirements once they're confirmed so we can keep track of how things are progressing.

aarppe commented 3 years ago

@nienna73 - Looks good - only a few comments below:

The user must be able to log in using a username (and password?) a. Alternatively, the user must enter their name or initials before changes are saved

The user should be able to search for a term

The system should present the following for each search term: a. The Maskwacîs word

Also, the English translation as stored in the annotations from the field notes should be shown. Note that sometimes the translation concerns an entire sentence or phrase, not a single word. Words vs. sentences should be on separate tiers in the annotations, and as such identifiable.

    b. The audio recording
    c. Transcription from the field
    d. Suggested terms from spellchecker
    e. MED as calculated by service
    f. Linguistic analysis and lemma from HFST
    g. Translation of each suggestion as provided by itwewina

For future reference, eventually we might want to link in Maskwacîs Dictionary translations not yet included in itwêwina.

The user should be able to manually make edits for the current version of the Cree spelling

The system should be able to store all past versions of Cree words and the user who made each change

The user should be able to accept suggested spellings as the new standard form

The system should be able to flag entries as: unvalidated, standardized with model, standardized without model, standardized pending review, or validated

If the MED = 0 with some unique speller-suggestion, we might even consider an option of 'autostandardized', with a separate field showing the spell-checker option for which MED = 0. Since something like 30-50% of the transcriptions meet this requirement, we could increase the amount of words that we can make provisionally available by a substantial amount straight off, pending gradually advancing manual review.

The system should be able to calculate MED based on requirements above

nienna73 commented 3 years ago

Each of the user stories above now has its own issue to allow further discussion of each topic.

UAlbertaALTLab / recording-validation-interface

Validation interface mockups #91