UniversalDependencies / UD_German-GSD

Other
18 stars 5 forks source link

Inconsistent lemma of "nächste" #28

Closed kanayamah closed 2 years ago

kanayamah commented 2 years ago

In v2.9, nächste's lemma is inconsistent (nah: 51 cases vs. nächster: 8 cases), and nächster looks awkward as an adjective's lemma. http://universal.grew.fr/?corpus=UD_German-GSD@2.9&custom=62441b5303f54&clustering=N.lemma

In my PR #27, I intentionally did not add Degree=Sup for these cases. There are three options to keep the annotations consistent:

dan-zeman commented 2 years ago

I would say that the second option is the best, i.e., lemma = nah and Degree=Sup.

dan-zeman commented 2 years ago

After PR #32:

cat *.conllu | udapy util.Eval node='if re.match(r"^nächst", node.form.lower()) and node.upos == "ADJ": print(node.form.lower(), node.lemma, node.upos, node.feats, node.xpos)' | sort -u
2022-08-08 23:16:58,527 [   INFO] execute - No reader specified, using read.Conllu
2022-08-08 23:16:58,527 [   INFO] execute -  ---- ROUND ----
2022-08-08 23:16:58,527 [   INFO] execute - Executing block read.Conllu
2022-08-08 23:17:01,091 [   INFO] execute - Executing block util.Eval
nächste nah ADJ Case=Acc|Degree=Sup|Gender=Fem|Number=Sing ADJA
nächste nah ADJ Case=Acc|Degree=Sup|Gender=Neut|Number=Sing ADJA
nächste nah ADJ Case=Nom|Degree=Sup|Gender=Fem|Number=Sing ADJA
nächste nah ADJ Case=Nom|Degree=Sup|Gender=Masc|Number=Sing ADJA
nächste nah ADJ Case=Nom|Degree=Sup|Gender=Neut|Number=Sing ADJA
nächsten nah ADJ Case=Acc|Degree=Sup|Gender=Fem|Number=Plur ADJA
nächsten nah ADJ Case=Acc|Degree=Sup|Gender=Masc|Number=Plur ADJA
nächsten nah ADJ Case=Acc|Degree=Sup|Gender=Masc|Number=Sing ADJA
nächsten nah ADJ Case=Acc|Degree=Sup|Gender=Neut|Number=Plur ADJA
nächsten nah ADJ Case=Dat|Degree=Sup|Gender=Fem|Number=Plur ADJA
nächsten nah ADJ Case=Dat|Degree=Sup|Gender=Fem|Number=Sing ADJA
nächsten nah ADJ Case=Dat|Degree=Sup|Gender=Masc|Number=Plur ADJA
nächsten nah ADJ Case=Dat|Degree=Sup|Gender=Masc|Number=Sing ADJA
nächsten nah ADJ Case=Dat|Degree=Sup|Gender=Neut|Number=Plur ADJA
nächsten nah ADJ Case=Dat|Degree=Sup|Gender=Neut|Number=Sing ADJA
nächsten nah ADJ Case=Gen|Degree=Sup|Gender=Neut|Number=Sing ADJA
nächsten nah ADJ Case=Nom|Degree=Sup|Gender=Fem|Number=Plur ADJA
nächsten nah ADJ Case=Nom|Degree=Sup|Gender=Masc|Number=Plur ADJA
nächster nah ADJ Case=Nom|Degree=Sup|Gender=Masc|Number=Sing ADJA
nächstes nah ADJ Case=Acc|Degree=Sup|Gender=Neut|Number=Sing ADJA
nächstgelegene nächstgelegen ADJ Case=Acc|Degree=Pos|Gender=Fem|Number=Sing ADJA
nächstgelegene nächstgelegen ADJ Case=Nom|Degree=Pos|Gender=Fem|Number=Sing ADJA
nächstliegende nächstliegend ADJ Case=Nom|Degree=Pos|Gender=Fem|Number=Sing ADJA

Some combinations of Gender-Number-Case remain problematic but the lemma and the Degree are OK.