UAlbertaALTLab / crk-db

Managing the Plains Cree dictionary database
https://itwewina.altlab.app/
GNU General Public License v3.0
0 stars 3 forks source link

Handling of dialect markers from CW (ý) #115

Open fbanados opened 2 weeks ago

fbanados commented 2 weeks ago

ý characters are used in CW to denote a sound that change between different Cree dialects. For itwewina, these must be changed to y. We need a more consistent and future-proof way of handling these both in the crk-db repo and in morphodict.

See also #44, #30, #68, UAlbertaALTLab/morphodict#929, UAlbertaALTLab/morphodict#649, UAlbertaALTLab/morphodict#197, UAlbertaALTLab/morphodict#255, https://github.com/UAlbertaALTLab/morphodict/issues/96#issuecomment-552063925 https://github.com/UAlbertaALTLab/itwewina/issues/104#issuecomment-449160563, https://github.com/UAlbertaALTLab/crk-db/blob/9278ba490909d07ed5e3b41405bf60dfc0126687/lib/convert/CW.js#L48,

aarppe commented 2 weeks ago

Here's where this is discussed for the FST side:

https://github.com/giellalt/lang-crk/issues/30

fbanados commented 1 week ago

This issue also affects the merging of entries: e.g. AECD has a yôwênam entry, while CW uses ýowênam. If the merging process does not handle this, definitions are left out.

fbanados commented 3 days ago

Agreement is that the strict analyzer FST should be slightly relaxed to accept ý -> y. Eventually, also the generator FST should generate ý whenever the data used to build the FST supports it (that is, not every y shall become ý). In this way, main entries with ý can preserve the letter in their identifier, and thus itwewina can later implement an option to either show or hide the ý in the presentation.

aarppe commented 3 days ago

We can have an optional conversion of ý (->) y on the analysis side of the strict (normative) analyzing FST.

fbanados commented 2 days ago

I've recompiled the FSTs and these have been deployed to the dev version for testing.