Closed aarppe closed 5 days ago
@nienna73 This issue has my first attempt at describing how the validation task (by a linguist) could work on the word-level.
@aarppe This outline and flow of events seems to make sense to me. I've drafted some technical requirements based off our conversation and the description above:
I can create a new issue for each of these requirements once they're confirmed so we can keep track of how things are progressing.
@nienna73 - Looks good - only a few comments below:
- The user must be able to log in using a username (and password?) a. Alternatively, the user must enter their name or initials before changes are saved
- The user should be able to search for a term
- The system should present the following for each search term: a. The Maskwacîs word
Also, the English translation as stored in the annotations from the field notes should be shown. Note that sometimes the translation concerns an entire sentence or phrase, not a single word. Words vs. sentences should be on separate tiers in the annotations, and as such identifiable.
b. The audio recording c. Transcription from the field d. Suggested terms from spellchecker e. MED as calculated by service f. Linguistic analysis and lemma from HFST g. Translation of each suggestion as provided by itwewina
For future reference, eventually we might want to link in Maskwacîs Dictionary translations not yet included in itwêwina.
- The user should be able to manually make edits for the current version of the Cree spelling
- The system should be able to store all past versions of Cree words and the user who made each change
- The user should be able to accept suggested spellings as the new standard form
- The system should be able to flag entries as: unvalidated, standardized with model, standardized without model, standardized pending review, or validated
If the MED = 0 with some unique speller-suggestion, we might even consider an option of 'autostandardized', with a separate field showing the spell-checker option for which MED = 0. Since something like 30-50% of the transcriptions meet this requirement, we could increase the amount of words that we can make provisionally available by a substantial amount straight off, pending gradually advancing manual review.
- The system should be able to calculate MED based on requirements above
Each of the user stories above now has its own issue to allow further discussion of each topic.
Here are some sketches about how the validation interface could work and look like, using the transcription of the "new" Cree word ewaninstohtahk 'he misunderstands it' as an example case:
Now, we think that the third suggestion ê-wani-nistohtâhk might be right, but the translation of its lemma nistohtâw, 'He divides it into three. [MD] s/he divides s.t. in three [CW]', does not match with the original English translation 'he misunderstands it' received during the recording sessions.
Further notes: