UAlbertaALTLab / crk-db

Managing the Plains Cree dictionary database
https://itwewina.altlab.app/
GNU General Public License v3.0
0 stars 3 forks source link

guess inflectional class #22

Open dwhieb opened 3 years ago

dwhieb commented 3 years ago

Write a script that accepts a normalized English definition (see #21) and attempts to guess the inflectional class of the associated Cree word.

  1. has both subject and object pronouns and it doesn't have the word "inanimate": VTA
  2. of the leftovers, if they don't have the object pronouns, you can be fairly sure that you've got VTI
  3. of the leftovers, if they don't have neuter object pronouns, you can be sure it's VTI
  4. deal with reciprocals
  5. repeat procedure for intransitive cases

If using the FST, this is probably easiest done as a preprocessing step for MD, after normalizing the definition, but before the aggregation step.

aarppe commented 3 years ago

This could be achieved by the English phrase analyzer transcriptor that we already have, e.g.

flookup -b -i -x transcriptor-eng-phrase2crk-features.fomabin 
he sees `him`
 sees  +V+TA+3Sg+3SgO

he sees it
 sees  +V+TI+3Sg+0SgO

he sees
 sees +V+AI+3Sg

it is green
 is green +V+II+0Sg

there is rain
 is rain +V+II+0Sg

it rains
 rains +V+II+0Sg

The FST source can be found here: https://github.com/giellalt/lang-crk/blob/develop/src/transcriptions/transcriptor-eng-phrase2crk-features.xfscript