UAlbertaALTLab / plains-cree-fsts

Mirror of the source code for the Plains Cree morphological analyzer/generator.
https://github.com/giellalt/lang-crk
Other
2 stars 1 forks source link

Question: paradigm strings for forms in FSTs, but not on itwêwina site #6

Closed aaronfay closed 4 years ago

aaronfay commented 4 years ago

Sorry for the long title, the layout files do not include the paradigm strings (eg: PV/ta+*+Cnj+Prs+3Pl) for the following forms:

ka-kî- (independent) eg: nika-kî-itwân "I could say (thus)"
ta-kî- (conjunct) eg: ta-kî-itweyân "I should say (thus)"
kita- (conjunct) eg: kita-mosiwit "he became a moose". I understand this may be the same as "ta" but please correct me if I'm wrong
kâ- (conjunct) eg: kâ-nêhiyawêcik "when they speak Cree/those who speak Cree"

I assume the FSTs can handle these, but I didn't see the paradigm IDs (what are we calling them?) listed in the layout files, I was curious if they were going to be added, or if someone could help me generate the list.

Thanks!

atticusha commented 4 years ago

Interesting! Setting aside the status of the Conjunct preverbs and whether or not kita- is the same as ta-/ka- (and whether those are interchangeable!), our logic for not including them in the paradigm files is as so:

We are viewing these morphemes as preverbs, rather than circumflexes. Although, we specifically mark these sort of morphemes as grammatical in our definitions, they are currently implemented the same way as so-called lexical morphemes like pê-. So the input nika-kî-mâmiskômâwak produces the analysis PV/ka_ki+mâmiskômêw+V+TA+Ind+Prs+1Sg+3PlO.

In general we have opted not to include inflectional information such as preverbs into the paradigms because the choose what to include would soon become arbitrary, and including all forms in a paradigm results in an absurdly large paradigm. Further, from a morphological point of view the inclusion of these forms does not tell us anything about the rest of the paradigm: they are simply prefixed elements that have no affect of the further shape of the word.

That said, I think there is a discussion to be had about the possibility of including grammatical preverbs such as the ones you've specified into the paradigm layouts, as they are few in number. I can bring this up with the team and leave this issue open for now.

Thanks for your suggestion!

aaronfay commented 4 years ago

Thank you, I won't push the argument that the forms should be included/displayed, however if there was a reference for how to generate those forms (read: a list of the possible preverbs/paradigm IDs eg: PV/ka_ki+...) that would be super helpful.

eddieantonio commented 4 years ago

Thanks @aaronfay for the suggestion! I'm moving this to a documentation bug: #8

aaronfay commented 4 years ago

Gentle nudge on this, we're using the FSTs to index all possible verb forms for a couple apps we're working on, but we're coming across word forms that aren't represented currently. An example:

kâ-kî-ohkômiyân - "when I had a grandmother"

Itwewina (and the smart dictionary) recognize this word, but I have no idea how to generate it from the lemma ôhkomiw.

Thanks again.

atticusha commented 4 years ago

Aaron,

Sorry about this, I missed the part of your comment asking for anyway to generate such forms.

To address this: if you go to the source files at src/morphology/affixes/verb_affixes.lexc you should be able to extract all preverbs. There a bit of markup on some (e.g. @P.joiner.hyphen@, and that's not needed for generation. In general any preverb that can be analyzed can be generated with the the string +PV/foo. Preverbs that are hyphenated (as in kah-kapê) follow the pattern +PV/foo_bar. Long vowels are written as double vowels (e.g. +PV/kaa for ) EXCEPT for e, which is written with a single e.

Note the Conjunct kâ-ki (not kâ-kî) which is not considered a regular preverb and is more akin to in the FST.

If you are using the pre-built .fomabin files and don't have access to the source files, you can download the relevant source file at the following (publicly available) address

https://victorio.uit.no/langtech/trunk/langs/crk/src/morphology/affixes/verb_affixes.lexc

Let me know if this isn't what you wanted/if I can be of any more help.

aaronfay commented 4 years ago

Hey @atticusha,

Could you clarify on kâ-ki- a bit? I've never of heard this form before. All of my reference material (Ahenakew, etc) reference kâ-kî- and we hear it a lot in transcriptions we're currently working on, can you clarify if there is a form ID for this particular inflection or how to generate it specifically?

Happy to provide more examples if needed.

eddieantonio commented 4 years ago

This clearly still needs to be addressed, so I'm reopening this issue :/

aarppe commented 4 years ago

ka-kî- is specified as a grammatical preverb in Wolvengrey's Cree Words:

ka-kî- (Independent particle, IPV): "can, be able to; may; should, ought to"

We've implemented it so that it cannot co-occur with other grammatical preverbs such as kî- and wî-.

aaronfay commented 4 years ago

Hi @aarppe,

The question is referring to the conjunct form kâ-kî- as in kâ-kî-ohkomiyân "when I had a grandmother" and not the independent form ka-kî-itwân "I should say".

In the FSTs presently the conjunct form kâ-kî- is implemented as short i kâ-ki- when it should be the former.

Hope that clarifies the issue.

aaronfay commented 4 years ago

@eddieantonio would it be worthwhile closing out this issue in favor of https://github.com/UAlbertaALTLab/plains-cree-fsts/issues/11?

eddieantonio commented 4 years ago

@aaronfay, will do! I do not know enough about Cree grammar to help with this issue :/

aarppe commented 4 years ago

Yes, it is a design feature of itwêwina that not all possible preverbed forms are generated even in the full paradigms.

The logic is that only the most common or exemplary grammatical preverb cases are presented, and these can then be used as a the template for creating the forms with the less frequent grammatical preverbs. Though we might want to make explicit "somehow" which grammatical preverbs work with which forms/templates (that might be something for the grammar pages).