UAlbertaALTLab / crk-db

Managing the Plains Cree dictionary database
https://itwewina.altlab.app/
GNU General Public License v3.0
0 stars 2 forks source link

Handling of dialect markers from CW (ý) #115

Open fbanados opened 4 months ago

fbanados commented 4 months ago

ý characters are used in CW to denote a sound that change between different Cree dialects. For itwewina, these must be changed to y. We need a more consistent and future-proof way of handling these both in the crk-db repo and in morphodict.

See also #44, #30, #68, UAlbertaALTLab/morphodict#929, UAlbertaALTLab/morphodict#649, UAlbertaALTLab/morphodict#197, UAlbertaALTLab/morphodict#255, https://github.com/UAlbertaALTLab/morphodict/issues/96#issuecomment-552063925 https://github.com/UAlbertaALTLab/itwewina/issues/104#issuecomment-449160563, https://github.com/UAlbertaALTLab/crk-db/blob/9278ba490909d07ed5e3b41405bf60dfc0126687/lib/convert/CW.js#L48,

aarppe commented 4 months ago

Here's where this is discussed for the FST side:

https://github.com/giellalt/lang-crk/issues/30

fbanados commented 4 months ago

This issue also affects the merging of entries: e.g. AECD has a yôwênam entry, while CW uses ýowênam. If the merging process does not handle this, definitions are left out.

fbanados commented 4 months ago

Agreement is that the strict analyzer FST should be slightly relaxed to accept ý -> y. Eventually, also the generator FST should generate ý whenever the data used to build the FST supports it (that is, not every y shall become ý). In this way, main entries with ý can preserve the letter in their identifier, and thus itwewina can later implement an option to either show or hide the ý in the presentation.

aarppe commented 4 months ago

We can have an optional conversion of ý (->) y on the analysis side of the strict (normative) analyzing FST.

fbanados commented 4 months ago

I've recompiled the FSTs and these have been deployed to the dev version for testing.

fbanados commented 3 months ago

A pending issue (linguist work) is to ensure that other dictionaries (MD and AECD) are consistently marked with dialect markers when they have no matching entry in CW. A first approach is to collect a list of words that could possibly be misspelled in each dictionary and could have a ý (they have a y), and wait for linguist confirmation of their status.