apertium / apertium-tat

Apertium linguistic data for Tatar
GNU General Public License v3.0
4 stars 3 forks source link

-RUS tag vs -RUS-BACK and -RUS_FRONT #31

Closed mansayk closed 5 years ago

mansayk commented 5 years ago

Hi!

I fixed 2 twol rules, because some cases didn't work, for example, часть, частьны, частька. https://github.com/apertium/apertium-tat/commit/79a5b1aa923d81cea58b47316e7f1dac9d8b68cf

Currently we marked with RUS tag only those loanwords, that accept affixes with back vowels. That means, that we cannot mark with RUS tag loanwords accepting affixes with front vowels. What is the best solution here? Maybe we need to replace -RUS tag with 2 different ones: -RUS-BACK and -RUS-FRONT?

jonorthwash commented 5 years ago

Russian words that take front-vowel endings all have front vowels in their final syllables or end in palatalised consonants, right? Which means they trigger normal Tatar front-vowel endings [more or less] just like native Tatar words? If this is the case, then just use the normal word classes, like N1. This "default" precludes the need for a separate lexicon. Are there any examples that won't work this way? We can probably account for any specific exceptions with further clarifications to the twol rules.

Also, are you sure those changes to the twol rules didn't break other things?

mansayk commented 5 years ago

I made a test based on words collected from the corpus: https://github.com/apertium/apertium-tat/tree/master/tests-tatcorpus It can help to control regression.

jonorthwash commented 5 years ago

Hmm I don't think a new testing framework is what we need here. We already have several testing frameworks in place. I can explain the yaml-based one; the others probably @IlnarSelimcan can explain.

mansayk commented 5 years ago

Actually it is not for testing, but to control effect of code changes in a big dictionary. You don't need to use it, but I will use it to find new words and to see changes in words that are not in yaml files yet.