apertium / apertium-kaz

Apertium linguistic data for Kazakh
https://apertium.github.io/apertium-kaz/
GNU General Public License v3.0
17 stars 9 forks source link

two neg.ifi paradigms #12

Open jonorthwash opened 5 years ago

jonorthwash commented 5 years ago

Similar to #10, Kazakh has the issue of two neg.ifi paradigms.

First-person singular (neg.ifi.p1.sg) looks like this:

The question is whether there is a difference in usage between these two forms, or if they are identical. The answer to this question will inform what needs to be done in the transducer in regards to the issue.

IlnarSelimcan commented 5 years ago

I'm not sure whether handling analytical tense forms like барған жоқпын in transducers and not in transfer is a good idea.

IlnarSelimcan commented 5 years ago

Especially if the transducer's not weighted and it will just take one analysis in a greedy manner and go on with that.

IlnarSelimcan commented 5 years ago

"Мен ол кітапты көрген де, алған да, оқыған да жоқпын." My knowledge of Tatar suggests me that this sentence should be valid in Kazakh. If so, why would we treat GAN жок in the morph. transducer? There is simply not enough information at this stage. That's only my opinion.

jonorthwash commented 5 years ago

Especially if the transducer's not weighted and it will just take one analysis in a greedy manner and go on with that.

I don't think there's ever ambiguity with these forms...?

"Мен ол кітапты көрген де, алған да, оқыған да жоқпын." My knowledge of Tatar suggests me that this sentence should be valid in Kazakh.

This is basically the same problem as suspended affixation in Turkish, e.g. "Ben şu kitabı göre de okuyabiliyorum." (@MemduhG halp?) or maybe "Ben şu kitabı göriyor da okuyorum" (??). Or a different sort of example might be "Ben kitaplar okur, makaleler yazarım" (??). Maybe have a look at how we ~deal with this problem in apertium-tur.

If so, why would we treat GAN жок in the morph. transducer?

Linguistically it makes some sense to just throw in a <neg> tag, especially if your primary goal is analysis and not translation. The morphology gets messier if you treat it as two separate words. What might you propose, though?

There is simply not enough information at this stage. That's only my opinion.

What do you mean about not enough information?

IlnarSelimcan commented 5 years ago

You might find this to be a contrived example, but I think that it demonstrates what I'm trying to say:

"<Мұнда>"
    "бұл" prn dem loc
    "мұнда" adv
    "е" cop aor p3 pl
        "бұл" prn dem loc
    "е" cop aor p3 sg
        "бұл" prn dem loc
"<бұл>"
    "бұл" det dem
    "бұл" prn dem nom
    "е" cop aor p3 pl
        "бұл" prn dem nom
    "е" cop aor p3 sg
        "бұл" prn dem nom
"<кітапты>"
    "кітап" n acc
    "лы" post
        "кітап" n
"<оқыған>"
    "оқы" v tv past p3 sg
    "е" cop aor p3 sg
        "оқыған" adj subst nom
    "е" cop aor p3 pl
        "оқыған" adj subst nom
    "е" cop aor p3 sg
        "оқыған" adj
    "е" cop aor p3 pl
        "оқыған" adj
    "оқыған" adj subst nom
    "оқы" v iv past p3 sg
    "оқы" v iv past p3 pl
    "оқы" v iv gpr_past subst nom
    "оқы" v tv gpr_past
    "оқы" v tv past p3 pl
    "оқы" v tv gpr_past subst nom
    "оқыған" adj advl
    "оқы" v iv ger_past nom
    "оқы" v tv ger_past nom
    "оқыған" adj
    "оқы" v iv gpr_past
"<адам>"
    "ада" n px1sg nom
    "адам" n nom
    "адам" n attr
    "е" cop aor p3 pl
        "ада" n px1sg nom
    "е" cop aor p3 sg
        "ада" n px1sg nom
    "е" cop aor p3 pl
        "адам" n nom
    "е" cop aor p3 sg
        "адам" n nom
"<бар ма>"
    "ма" qst
        "бар" adj
    "ма" qst
        "бар" n nom
    "ма" qst
        "бар" adj subst nom
    "ма" qst
        "е" cop aor p3 pl
            "бар" adj
    "ма" qst
        "е" cop aor p3 sg
            "бар" adj
    "ма" qst
        "е" cop aor p3 pl
            "бар" n nom
    "ма" qst
        "е" cop aor p3 sg
            "бар" n nom
    "ма" qst
        "е" cop aor p3 pl
            "бар" adj subst nom
    "ма" qst
        "е" cop aor p3 sg
            "бар" adj subst nom
"<?>"
    "?" sent
"<Жоқ>"
    "жоқ" ij
    "жоқ" adj
    "жоқ" adj subst nom
    "е" cop aor p3 pl
        "жоқ" adj
    "е" cop aor p3 sg
        "жоқ" adj
    "е" cop aor p3 pl
        "жоқ" adj subst nom
    "е" cop aor p3 sg
        "жоқ" adj subst nom
"<,>"
    "," cm
"<оқыған жоқ>"
    "оқы" v tv neg ifi p3 pl
    "оқы" v tv neg ifi p3 sg
    "оқы" v iv neg ifi p3 pl
    "оқы" v iv neg ifi p3 sg
"<.>"
    "." sent
IlnarSelimcan commented 5 years ago

You simply don't know whether <GAN жок> is neg ifi or not without seeing the full sentence. In the above example, transducer took the longest analysis and the gpr_subst analysis is gone.

A better way would be giving only the GAN form, not inclduing joq, the neg.ifi analysis and answer the neg.ifi or not question later in CG.

IlnarSelimcan commented 5 years ago

In my opinion even better way would be not to diverge from the "one affix = one tag" principle (which I think is much more approachable for most users of a morphological analyser) and not give the GAN form yet another competing analysis and add a monolingual, sl-to-sl transfer rule which maps `^GAN$ ^жоқ$ to ^GAN$.

jonorthwash commented 4 years ago

I think your examples are okay, though I'm probably not the person to ask.

So what do you propose for the two analyses of a form like "оқыған жок"? And what analysis would you propose for a form like оқыған жоқпын?