apertium / apertium-tat

Apertium linguistic data for Tatar
GNU General Public License v3.0
4 stars 3 forks source link

күлә instead of күләм #18

Closed mansayk closed 5 years ago

mansayk commented 5 years ago

^күләмдәге/күлә<n><sg><sg><px1sg><loc><subst><nom>$

jonorthwash commented 5 years ago

@mansayk, what is the context for this word, and what command did you use to get the analysis?

I believe both analyses are possible, and the transducer should return both. If you're running it through the disambiguator (tagger) as well, then you need to provide some context for it to perform accurately—if there's no surrounding text, then it essentially just guesses.

mansayk commented 5 years ago

echo "күләмдәге" | apertium-destxt -n | lt-proc -z -w 'apertium-tat/tat.automorf.bin' | cg-proc -z 'apertium-tat/tat.rlx.bin' | cg-proc -z -w -1 'apertium-tat/dev/mansur.bin' | apertium-retxt ^күләмдәге/күлә<n><sg><sg><px1sg><loc><subst><nom>$

I don't remember the context of that exact case, but for example, it might be: "Бик зур күләмдәге эш башкарылган!" ("A lot of work has been done.")

mansayk commented 5 years ago

And there is no word "күлә" in Tatar language and I think it should not be returned at all.

jonorthwash commented 5 years ago

@IlnarSelimcan, do you know why күлә is in the transducer?

mansayk commented 5 years ago

I marked this word as Use/Arch, but there is no difference:

echo 'күләмдәге' | apertium-destxt -n | lt-proc -z -w 'apertium-tat/tat.automorf.bin' | cg-proc -z 'apertium-tat/tat.rlx.bin' | cg-proc -z -w -1 'apertium-tat/dev/mansur.bin' | apertium-retxt
^күләмдәге/күлә<n><sg><sg><px1sg><loc><subst><nom>$

Does this Use/Arch tag work? How can I remove stems with this tag from analysis?

mansayk commented 5 years ago

I just saw your message in another issue about Use/Arch not being implemented yet. So my previous question has an answer now.

mansayk commented 5 years ago

I disabled this word temporarily.