UAlbertaALTLab / crk-db

Managing the Plains Cree dictionary database
https://itwewina.altlab.app/
GNU General Public License v3.0
0 stars 3 forks source link

orthographic normalization #4

Closed eddieantonio closed 3 years ago

eddieantonio commented 4 years ago

The above cases illustrate that in matching MD with CW, there are some minor but still significant orthographical differences, which result in certain matches not being made that should be identified.

E.g. MD mitsow 'He eats.' is not matched with CW mîcisow, and likewise MD wapimew is not matched with CW wâpamêw.

This probably should be made into its own issue, but under the Dictionary Database project.

Originally posted by @aarppe in https://github.com/UAlbertaALTLab/cree-intelligent-dictionary/issues/197#issuecomment-598526189

aarppe commented 4 years ago

This is a more general case of UAlbertaALTLab/dictionary-database#3 which specifies a few cases we'd want to focus on (<ts> for <c> and <i>for unstressed short vowel). We can note other specific cases of suboptimal matching when we observe them.

dwhieb commented 3 years ago

Incorporated this into the notes for #5. Closing this issue.