UAlbertaALTLab / plains-cree-fsts

Mirror of the source code for the Plains Cree morphological analyzer/generator.
https://github.com/giellalt/lang-crk
Other
2 stars 1 forks source link

Incorrect tense marker in FST implementation #11

Closed aaronfay closed 4 years ago

aaronfay commented 5 years ago

I'm going to carry over our conversation from #6 and open this up as a bug:

I believe the FST analysis is incorrect for the form kâ-ki-:

❯ echo "PV/kaa_ki+ohkomiw+V+AI+Cnj+Prs+1Sg" | hfst-optimized-lookup crk-normative-generator.hfstol
PV/kaa_ki+ohkomiw+V+AI+Cnj+Prs+1Sg  kâ-ki-ohkomiyân

~The analysis marks this as Prs (past) but the implementation is -ki- when the past marker in Plains Cree is -kî-.~ My mistake, Prt is the 'past' analysis marker in the FSTs. With that however, I am still concerned this analysis is incorrect:

I've double-checked several references to be certain, I cannot find kâ-ki- in any of my references however there are several examples of kâ-kî- in both Freda Ahenakêw's works as well as Arok Wolvengrey's thesis, for example:

p312 ex(23)
tānisi kā-kī-isi-nikamoyan?
tānisi kā-  kī-   isi-  nikamo -yan
IPC    IPV  IPV   IPV   VAI    2s
how    CNJ  PST   thus  sing
“How did you sing?”
p312 ex(24)
tānēhki kā-kī-sipwēhtēt?
tānēhki kā- kī-  sipwēhtē -t
IPC     IPV IPV  VAI      3s
why     CNJ PST  leave
“Why did s/he leave?”
p316 ex(31)
kā- kī- wāpam -iko -t
IPV IPV VTA   INV  3s
CNJ PST see   3’-3s

There are 16 examples in total that I could find just in that paper alone.

Please let me know if you need references to further examples.

eddieantonio commented 5 years ago

@aarppe is this an FST bug? I can't find any relevant citations in e.g., Jean's book.

aarppe commented 5 years ago

We'll have to look into what exactly is the source for the grammatical preverb kâ-ki- (this was implemented early on, with general input from Arok, but our understanding of Plains Cree has evolved significantly since then - my best recollection is that it's a rare form/variant, but it is possible it is a misunderstanding).

The real issue here, that corresponds to the title, is that the current organization of the FST results a somewhat unsatisfactory analysis and generation scheme for some less frequent grammatical preverbs.

Namely, the most common grammatical preverbs indicating (sort-of) tense, kî- and wî- (for both Independent and Conjunct) and ka- for Conjunct, are analysed and specified as tense features, i.e. +Prt, +Fut+Int, and +Fut+Def, presented after the lemma.

kî-nipâw    nipâw+V+AI+Ind+Prt+3Sg
wî-nipâw    nipâw+V+AI+Ind+Fut+Int+3Sg
ka-nipâw    nipâw+V+AI+Ind+Fut+Def+3Sg

The less frequent grammatical preverbs (which often denote some combination of tense, aspect and modality), are analyzed as preverb features presented before the lemma, and the stem+suffix segment is analyzed (unsatisfyingly) as a present tense form - which, of course, clashes with any tense implied by such a grammatical preverb. In such cases, and perhaps more generally as well, the present tense form should be understood as the unmarked form. This is a matter that we have noted explicitly with Wolvengrey a few years back, when creating our first full paradigms itwêwina.

One way of dealing with this would be for us to create a separate analysis/generation for the unmarked form when there are other grammatical preverbs (than kî-, wî-, ka- noted above). Another way would be to dispense with the treatment of the most common grammatical preverbs as tense features, and treat them similar to all other grammatical preverbs as prefixal features, i.e. PV/ki+, PV/wi+, and PV/ka+.

The latter option is what has been chosen by us in the modeling of other Algonquian languages, and was based on our observation of the problematicity of the current Plains Cree model. However, what kept me/us from going for this option is that other applications rely on the FST specification being as it is, namely itwêwina and nêhiyawêtân. In the case of itwêwina that would just be a chunk of work as we control all the relevant specification elements, but nêhiyawêtân is a much more rickety application, and we have wanted to have that available for demo purposes for the time being.

I'm inclined towards us treating all (grammatical) preverbs similarly at some point, but that change will require changes in multiple places at the same time, so it has to be scheduled appropriately. RIght now this can be understood as an unsatisfactory "feature" of the Plains Cree FST.

aaronfay commented 5 years ago

@aarppe Thank you for the detailed explanation, I fully understand the impact of technical debt in a project and appreciate how things have evolved.

With that, your detailed explanation fully answered my question 🙏! I can now generate the form I was expecting by changing the analysis from Prs to Prt as you describe:

❯ echo PV/kaa+ohkomiw+V+AI+Cnj+Prt+1Sg | hfst-optimized-lookup --silent crk-normative-generator.hfstol
PV/kaa+ohkomiw+V+AI+Cnj+Prt+1Sg kâ-kî-ohkomiyân

@eddieantonio I think we can close this out for now as Antti has satisfied my question, and it looks like it relates to a bigger refactor down the road.

aaronfay commented 5 years ago

Just chiming in on this issue again in case it is of any use:

We'll have to look into what exactly is the source for the grammatical preverb kâ-ki- (this was implemented early on, with general input from Arok, but our understanding of Plains Cree has evolved significantly since then - my best recollection is that it's a rare form/variant, but it is possible it is a misunderstanding).

I had the opportunity today to meet with Dr. Wolvengrey and I made reference to this conversation as a question, Dr. Wolvengrey commented that he was not aware of a kâ-ki- verb form, or at least that there were no attested examples of such.

Reopening for discussion.

atticusha commented 4 years ago

I've further looked into this a bit. In the Ahenakew-Wolfart corpus we have found no instance of kâ-ki-. I've also found no record of kâ-ki- as a distinct preverb and should not be included as a construction. I believe that this is a typo and was, in fact, referring to ka-kî- (abilative/"can").

Given that we have ka-kî- implemented. I suggest we remove this item from our FST.

aarppe commented 4 years ago

Yes, the inclusion of kâ-ki (incorrect) as a distinct preverb, similar to ka-kî- (correct) was likely the result of us thinking that was a legitimate possible variant, than actually coming from any sources. So, we should remove kâ-ki, as it is creating lots of ambiguity in addition to being incorrect.

atticusha commented 4 years ago

addressed in crk/src/morphology/incoming/affixes/verbs_affixes.lexc.

aaronfay commented 4 years ago

🙌