UAlbertaALTLab / itwewina

Replaced by https://github.com/UAlbertaALTLab/cree-intelligent-dictionary
https://github.com/UAlbertaALTLab/cree-intelligent-dictionary
GNU General Public License v3.0
1 stars 0 forks source link

click-in-text link to work #19

Open aarppe opened 6 years ago

aarppe commented 6 years ago

This now works in general. But we should restrict the behavior of the click-in-text functionality so that partial matches won't be presented (or sent over Internet) when itwêwina is accessed via the click-in-text functionality.

eddieantonio commented 6 years ago

@aarppe Please provide an explicit list of requirements for the "click-in-text" functionality.

This format may be useful:

 - [ ] requirement 1 title
 - [x] requirement 2 title ([x] means it's already working)
 - [ ] requirement 3 title
...
 - [ ] requirement N title

## requirement 1 title

requirement 1 description

### Acceptance criteria

brief acceptance criteria

## requirement 2 title

requirement 2 description

### Acceptance criteria

brief acceptance criteria

...
aarppe commented 6 years ago

Work in progress - needs proofreading.

Generally expected behavior for crk -> eng:

  1. Click on 'read with itwêwina'
  2. Potentially select language and orthography by clicking on [Ā] tab to the left. For crk, this means crk x eng with the three different crk orthographies

image

  1. Default orthography for presentation of lexical entries/lemmas (that match the alt-clicked string, 4 below) is SRO with circumflexes. Whichever orthography is selected, that is used in the representation of the lemma resulting from the analysis of the clicked-upon text (which word form is not normatized for the time being, as we want to link the string that is clicked with the one in the text - though, our views might change):

image

  1. Alt-click a space-delimited string (presuming crk in any orthography), have that string be analyzed by an omnivorous descriptive analyzer of itwêwina (i.e. crk written with any orthography and any sloppiness recognized by spellrelax.regex), and present the results of the analysis, i.e. (a) lemma according to the selected orthography, and (b) English gloss pertaining to lemma (if that is found in the XML dictionary source).

4b. N.B. In contrast to the desktop version of itwêwina, no partial matches of lexical entries should be presented.

Generally expected behavior for eng -> crk

All the above applies for selection of orthographical representation of crk lemmas / lexical entries.

  1. Click on 'read with itwêwina'

  2. Potentially select language and orthography by clicking on [Ā] tab to the left. For crk, this means crk x eng with the three different crk orthographies

  3. Alt-click a space-delimited English string, have that string matched as such (since currently we're not making use of an English FST), and present the results of the analysis, i.e. (a) lemma and (b) Cree dictionary entry (which are numbered), using chosen orthography, pertaining to English word form.

Test cases in: http://sapir.artsrn.ualberta.ca/itwewina-click-in-text.html First invoke click-in-text from bookmarks toolbar.

- [ ] SRO/circumfl: ohpimê

- [ ] SRO/macron: ohpimē

- [ ] SRO: ohpime

- [ ] SRO: nitiskonikanihk

- [ ] SRO: ohci

SRO/circumfl: ispîhk

  1. Select in [Ā] pop-up window Dictionary: Plains Cree --> English
  2. Alt-click: ispîhk
  3. See pop-up window: header: ispîhk body: ispîhk (particle) - when

One should not see partial matches of the string.

SRO/macron: ispīhk

  1. Select in [Ā] pop-up window Dictionary: Plains Cree (āēīō) --> English
  2. Alt-click: ispīhk
  3. See pop-up window: header: ispīhk body: ispīhk (particle) - when

One should not see partial matches of the string.

SRO: ispihk

  1. Select in [Ā] pop-up window Dictionary: Plains Cree --> English
  2. Alt-click: ispihk
  3. See pop-up window: header: ispihk body: ispîhk (particle) - when

One should not see partial matches of the string.

SRO/circumfl: kâ-kî-awâsisîwiyân

  1. Select in [Ā] pop-up window Dictionary: Plains Cree --> English
  2. Alt-click: kâ-kî-awâsisîwiyân
  3. See pop-up window: header: kâ-kî-awâsisîwiyân body: awâsisîwiw (Verb) - s/he is a child

SRO/macron: kā-kī-awāsisīwiyān

  1. Select in [Ā] pop-up window Dictionary: Plains Cree (āēīō) --> English
  2. Alt-click: kā-kī-awāsisīwiyān
  3. See pop-up window: header: SRO/macron: kā-kī-awāsisīwiyān body: SRO/macron: awāsisīwiw (Verb) - s/he is a child

SRO: ka-ki-awasisiwiyan

  1. Select in [Ā] pop-up window Dictionary: Plains Cree --> English
  2. Alt-click: ka-ki-awasisiwiyan
  3. See pop-up window: header: ka-ki-awasisiwiyan body: awâsisîwiw (Verb) - s/he is a child

SRO/hyph: kakihawasisiwiyan

  1. Select in [Ā] pop-up window Dictionary: Plains Cree --> English
  2. Alt-click: kakihawasisiwiyan
  3. See pop-up window: header: kakihawasisiwiyan body: awâsisîwiw (Verb) - s/he is a child

SRO/circumflex: mitâs

  1. Select in [Ā] pop-up window Dictionary: Plains Cree --> English
  2. Alt-click: mitâs
  3. See pop-up window: header: mitâs body - line1: mitâs (Noun) - pair of pants body - line2: mitâs (Noun) - legging, gaiter

SRO: mitas

  1. Select in [Ā] pop-up window Dictionary: Plains Cree --> English
  2. Alt-click: mitas
  3. See pop-up window: header: mitas body - line1: mitas (Noun) - pair of pants body - line2: mitas (Noun) - legging, gaiter

SRO/circumfl: nikî-nitawi-kiskinwahamâkosin

  1. Select in [Ā] pop-up window Dictionary: Plains Cree --> English
  2. Alt-click: nikî-nitawi-kiskinwahamâkosin
  3. See pop-up window: header: nikî-nitawi-kiskinwahamâkosin body: kiskinwahamâkosiw (Verb) - s/he learns; s/he is a student, s/he attends school; s/he is taught

SRO/macron: nikī-nitawi-kiskinwahamākosin

  1. Select in [Ā] pop-up window Dictionary: Plains Cree (āēīō) --> English
  2. Alt-click: nikī-nitawi-kiskinwahamākosin
  3. See pop-up window: header: nikī-nitawi-kiskinwahamākosin body: kiskinwahamākosiw (Verb) - s/he learns; s/he is a student, s/he attends school; s/he is taught

SRO: niki-nitawi-kiskinwahamakosin

  1. Select in [Ā] pop-up window Dictionary: Plains Cree --> English
  2. Alt-click: niki-nitawi-kiskinwahamakosin
  3. See pop-up window: header: niki-nitawi-kiskinwahamakosin body: kiskinwahamâkosiw (Verb) - s/he learns; s/he is a student, s/he attends school; s/he is taught

SRO/non-hyph: nikinitawikiskinwahamakosin

  1. Select in [Ā] pop-up window Dictionary: Plains Cree --> English
  2. Alt-click: nikinitawikiskinwahamakosin
  3. See pop-up window: header: nikinitawikiskinwahamakosin body: kiskinwahamâkosiw (Verb) - s/he learns; s/he is a student, s/he attends school; s/he is taught

Syllabic cases to be added later. Needs thinking as to prefixes being separated from stem by space (not hyphen as in SRO).

eddieantonio commented 6 years ago

Test cases in: http://sapir.artsrn.ualberta.ca/itwewina-click-in-text.html

😻 Excellent!

Syllabic cases to be added later. Needs thinking as to prefixes being separated from stem by space (not hyphen as in SRO).

In my discussion with Arden and Arok, it seems that and are the conventions for separating syllabics. Or simply deleting the hyphens altogether! So we could start with testing for these three conventions, then move on to testing with spaces between morphemes 😰

aarppe commented 6 years ago

Yes, I agree. For syllabic script with the click-in-text functionality:

  1. For words with multiple parts separated by spaces, we can leave those cases until later (likely involving either a) the possibility of combining alt-click and painting the string sequence in question; b) or itwêwina attempting to look at the context and find what might be an appropriate form.

  2. For single-part words, the click-in-text functionality should work with syllabics exactly the same as with SRO.

  3. For multipart (preverbed/reduplicated) words where the component strings are not separated by a space, the click-in-text functionality should work with syllabics exactly the same as with SRO.

aarppe commented 6 years ago

A. Examples with syllabic script for crk input as well as output script for lemma, with translations into English:

NOTE: When word parts are written together, a joiner -h- is required if a preverb ends in a vowel and the subsequent stem begins with a vowel, e.g. ᑳᑮᐦᐊᐚᓯᓰᐏᔮᐣ <-> kâkîhawâsisîwiyân. The joiner can be added to the end of the preverb even when it is kept separate with a space from the following stem (examples: ê <-> eh; wî <-> wîh).

  1. Syllabic with spaces between multipart words:

Settings: image

Input: ᐃᐢᐲᕽ Output (correct: 1 match presented in syllabics, including lemma): image Input: ᑳ Output: 2 correct matches presented in syllabics (including preverb)

image Input: ᑮ Output: 1 correct match of preverb image

Input: ᐊᐚᓯᓰᐏᔮᐣ Output: 1 correct match of bare conjunct (normally a conjunct would require some conjunct preverb, but that can be omitted in fast speech, so we've allowed for preverbless conjunct forms, which is useful here in click-in-text):

image

Input: ᓂᑮ Output: no match, as we don't (yet) present an analysis prefixal fragments for independent verb forms (like nikî- / ᓂᑮ here). image

Input: ᓂᑕᐏ Output: 1 sort-of correct match, as nitawi- is both an independent particle and a non-independent preverb (probably does not match with the preverb in the XML source, as the preverb meanings have the hyphen in their lemma): image

Input: ᑭᐢᑭᓌᐦᐊᒫᑯᓯᐣ Output: Correct with no match, since stem+suffix section of an independent verb cannot occur by itself. image

Input: ᐅᐦᐱᒣ Output: 1 correct match (ᐅᐦᐱᒣ), 1 unnecessary (ᐅᐦᐱᒣᐦᐱᓱᐤ) match due to partial matching: image

Input: ᓂᑎᐢᑯᓂᑲᓂᕽ Output: 1 correct match image

Input: ᐅᐦᒋ Output: 3 correct matches (2 x ᐅᐦᒋ - particle + 1 x ᐅᐦᒌᐤ - verb), potentially other incorrect match due to partial matching: image

  1. Syllabic input without spaces in multipart words

Input: ᐃᐢᐲᕽ Output: Correct, 1 single match image

Input: ᑳᑮᐦᐊᐚᓯᓰᐏᔮᐣ Output: Correct, single match: ᐊᐚᓯᓰᐏᐤ (verb) image

Input: ᓂᑮᓂᑕᐏᑭᐢᑭᓌᐦᐊᒫᑯᓯᐣ Output: 1 correct match: ᑭᐢᑭᓌᐦᐊᒫᑯᓯᐤ image

Input: ᐅᐦᐱᒣ Output: 1 correct matches: ᐅᐦᐱᒣ (particle) and 1 incorrect match: ᐅᐦᐱᒣᐦᐱᓱᐤ (verb), due to partial matching: image

Input: ᓂᑎᐢᑯᓂᑲᓂᕽ Output: 1 correct match: ᐃᐢᑯᓂᑲᐣ image

Input: ᐅᐦᒋ Output: 3 correct matches: 2 particles (ᐅᐦᒋ) and 1 verb (ᐅᐦᒌᐤ) image

  1. Input in syllabics without spaces in multipart words, ignoring vowel length.

Results should be almost exactly the same is in section 2 above.

B. crk input in syllabics, output of lemma in SRO/circumflex or SRO/macron, translation into English

Click-in-text should work the same, if the output of the lemma is set as SRO but the input is in syllabics

Settings: image

Input: ᐃᐢᐲᕽ Output: No match - should present lemma ispîhk (seems to be the case with short syllabic words) image

Input: ᑳᑮᐦᐊᐚᓯᓰᐏᔮᐣ Output: awasîsiwiw (verb) image

Input: ᓂᑮᓂᑕᐏᑭᐢᑭᓌᐦᐊᒫᑯᓯᐣ Output: kiskinwahamâkosiw image

Input: ᐅᐦᐱᒣ Output: No match, though should present: ohpimê image

Input: ᓂᑎᐢᑯᓂᑲᓂᕽ Output: 1 correct match: iskonikan (noun) image

Input: ᐅᐦᒋ Output: 1 correct match: ohcîw, 1/2 missing matches; ohci- image

When selecting SRO/macron as output format for the lemma, the results should be exactly the as for the SRO/circumflex cases above:

image

For instance, for the following case: Input: ᑲᑭᐦᐊᐘᓯᓯᐏᔭᐣ Output: 1 correct match: awāsisīwiw image

However, the shorter syllabic words get no matches when they surely should, e.g.

image