UniversalDependencies / docs

Universal Dependencies online documentation
http://universaldependencies.org/
Apache License 2.0
274 stars 249 forks source link

Annotation of den/det/de in Scandinavian #992

Closed dan-zeman closed 10 months ago

dan-zeman commented 1 year ago

I am wondering whether annotation of den/dén/det/dét/de/dem can be unified across the Scandinavian languages. There are differences (and inconsistencies) in lemmatization, UPOS tags (DET vs. PRON) and features. This is a generalization of https://github.com/UniversalDependencies/UD_Danish-DDT/issues/10.

Here is the current situation (attested annotations, for now without counts):

cat *.conllu | udapy util.Eval node='if node.form.lower() in ["det", "den", "dét", "dén", "de", "dem", "d."] and not node.upos in ["PROPN", "X"]: print(node.upos, node.form.lower(), node.lemma, node.feats)' | sort -u | less

Danish DDT:

DET d. den Gender=Com|Number=Sing|PronType=Dem
DET de den Number=Plur|PronType=Dem
DET den den Gender=Com|Number=Sing|PronType=Dem
DET dén den Gender=Com|Number=Sing|PronType=Dem
DET det den Gender=Neut|Number=Sing|PronType=Dem
DET det det Gender=Neut|Number=Sing|PronType=Dem
PRON de de Case=Nom|Gender=Com|Person=2|Polite=Form|PronType=Prs
PRON de De Case=Nom|Gender=Com|Person=2|Polite=Form|PronType=Prs
PRON de de Case=Nom|Number=Plur|Person=3|PronType=Prs
PRON de den Number=Plur|PronType=Dem
PRON dem De Case=Acc|Gender=Com|Person=2|Polite=Form|PronType=Prs
PRON dem de Case=Acc|Number=Plur|Person=3|PronType=Prs
PRON den den Case=Acc|Gender=Com|Number=Sing|Person=3|PronType=Prs
PRON den den Gender=Com|Number=Sing|PronType=Dem
PRON det den Gender=Neut|Number=Sing|PronType=Dem
PRON det det Case=Acc|Gender=Neut|Number=Sing|Person=3|PronType=Prs
PRON dét det Case=Acc|Gender=Neut|Number=Sing|Person=3|PronType=Prs

Swedish Talbanken:

DET de de Definite=Def|Number=Plur|PronType=Prs
DET de en Definite=Def|Number=Plur|PronType=Art
DET den den Definite=Def|Gender=Com|Number=Sing|PronType=Prs
DET den en Definite=Def|Gender=Com|Number=Sing|PronType=Art
DET det den Definite=Def|Gender=Neut|Number=Sing|PronType=Prs
DET det det Definite=Def|Gender=Neut|Number=Sing|PronType=Prs
DET det en Definite=Def|Gender=Neut|Number=Sing|PronType=Art
PRON de de Case=Nom|Definite=Def|Number=Plur|PronType=Ind
PRON de de Case=Nom|Definite=Def|Number=Plur|PronType=Prs
PRON de de Case=Nom|Definite=Def|Number=Plur|PronType=Rel
PRON de en Case=Nom|Definite=Def|Number=Plur|PronType=Prs
PRON dem de Case=Acc|Definite=Def|Number=Plur|PronType=Prs
PRON dem de Case=Acc|Definite=Def|Number=Plur|PronType=Tot
PRON den den Definite=Def|Gender=Com|Number=Sing|PronType=Prs
PRON den den Definite=Def|Number=Plur|PronType=Prs
PRON den en Definite=Def|Gender=Com|Number=Sing|PronType=Prs
PRON den en Definite=Def|Number=Plur|PronType=Prs
PRON det den Definite=Def|Gender=Neut|Number=Sing|PronType=Art
PRON det den Definite=Def|Gender=Neut|Number=Sing|PronType=Ind
PRON det den Definite=Def|Gender=Neut|Number=Sing|PronType=Prs
PRON det den Definite=Def|Gender=Neut|Number=Sing|PronType=Tot
PRON det det Definite=Def|Gender=Neut|Number=Sing|PronType=Prs
PRON det en Definite=Def|Gender=Neut|Number=Sing|PronType=Prs

Swedish PUD:

ADP de De _
ADP de den _
DET de den Definite=Def|Gender=Neut|Number=Plur
DET de den Definite=Def|Number=Plur
DET de en Definite=Def|Number=Plur|PronType=Art
DET den den Definite=Def|Gender=Com|Number=Sing
DET den den Definite=Def|Gender=Com|Number=Sing|PronType=Art
DET det den Definite=Def|Gender=Com|Number=Sing
DET det den Definite=Def|Gender=Neut|Number=Sing
DET det den Definite=Def|Gender=Neut|Number=Sing|PronType=Art
DET det en Definite=Def|Gender=Neut|Number=Sing|PronType=Art
PRON de de Case=Nom|Definite=Def|Number=Plur
PRON de de Case=Nom|Definite=Def|Number=Plur|PronType=Prs
PRON de den Case=Nom|Definite=Def|Number=Plur
PRON dem de Case=Acc|Definite=Def|Number=Plur
PRON den den Definite=Def|Gender=Com|Number=Sing
PRON det den Definite=Def|Gender=Neut|Number=Sing|PronType=Prs
PRON det det Definite=Def|Gender=Neut|Number=Sing

Swedish LinES:

ADP de de _
DET de de Definite=Def|Number=Plur|PronType=Art
DET de den Case=Nom|Definite=Def|Number=Plur|PronType=Art
DET de den Definite=Def|Number=Plur
DET de den Definite=Def|Number=Plur|PronType=Art
DET de den Definite=Def|Number=Plur|PronType=Dem
DET den den Definite=Def|Gender=Com|Number=Sing|PronType=Art
DET den den Definite=Def|Gender=Com|Number=Sing|PronType=Dem
DET den den Definite=Def|Number=Sing|PronType=Art
DET det den Definite=Def|Gender=Neut|Number=Sing|PronType=Art
DET det den Definite=Def|Gender=Neut|Number=Sing|PronType=Dem
DET det den Definite=Def|Number=Sing|PronType=Art
PRON de de Case=Nom|Definite=Def|Number=Plur|PronType=Dem
PRON de de Case=Nom|Definite=Def|Number=Plur|PronType=Prs
PRON de den Case=Nom|Definite=Def|Number=Plur|PronType=Prs
PRON dem de Case=Acc|Definite=Def|Number=Plur|PronType=Prs
PRON den den Definite=Def|Gender=Com|Number=Sing|PronType=Dem
PRON den den Definite=Def|Gender=Com|Number=Sing|PronType=Prs
PRON den den Definite=Def|Number=Plur|PronType=Prs
PRON det den Definite=Def|Gender=Neut|Number=Sing|Person=3|PronType=Art
PRON det den Definite=Def|Gender=Neut|Number=Sing|Person=3|PronType=Prs
PRON det den Definite=Def|Gender=Neut|Number=Sing|PronType=Art
PRON det den Definite=Def|Gender=Neut|Number=Sing|PronType=Dem
PRON det den Definite=Def|Gender=Neut|Number=Sing|PronType=Prs

Norwegian Bokmaal:

ADV de de _
DET de de Number=Plur|PronType=Dem
DET den den Gender=Fem|Number=Sing|PronType=Dem
DET den den Gender=Fem|Number=Sing|PronType=Prs
DET den den Gender=Masc|Number=Sing|PronType=Dem
DET den Den Gender=Masc|Number=Sing|PronType=Dem
DET den det Gender=Masc|Number=Sing|PronType=Dem
DET det de Number=Plur|PronType=Dem
DET det det Gender=Neut|Number=Sing|PronType=Dem
DET dét det Gender=Neut|Number=Sing|PronType=Dem
PRON de de Case=Nom|Number=Plur|Person=3|PronType=Prs
PRON de De Case=Nom|Number=Plur|Person=3|PronType=Prs
PRON dem de Case=Acc|Number=Plur|Person=3|PronType=Prs
PRON den den Gender=Fem,Masc|Number=Sing|Person=3|PronType=Prs
PRON det det Gender=Neut|Number=Sing|Person=3|PronType=Prs
PRON dét det Gender=Neut|Number=Sing|Person=3|PronType=Prs

Norwegian Nynorsk:

ADJ d. d. Abbr=Yes
ADV de de _
ADV det da _
DET de de PronType=Prs
DET den den Gender=Fem|PronType=Dem
DET den den Gender=Masc|PronType=Dem
DET det den Number=Plur|PronType=Dem
DET det det Gender=Neut|PronType=Dem
DET dét det Gender=Neut|PronType=Dem
DET det det Gender=Neut|PronType=Prs
PRON d. d. Abbr=Yes|PronType=Prs
PRON de de Animacy=Hum|Case=Nom|Number=Plur|Person=2|PronType=Prs
PRON den den Gender=Fem,Masc|Person=3|PronType=Prs
PRON dén den Gender=Fem,Masc|Person=3|PronType=Prs
PRON dét det Gender=Neut|Number=Sing|Person=3|PronType=Prs
PRON det det Gender=Neut|Person=3|PronType=Prs
PRON dét det Gender=Neut|Person=3|PronType=Prs

There is a consensus that some occurrences should be tagged DET and others PRON, so I am not going to challenge that for now. I will also ignore the occasional occurrences of other tags (ADP, ADJ, ADV, PROPN, X). I have not examined them in context.

As for PronType, the determiners are mostly Dem in Danish and Norwegian, and mostly Art (with Definite=Def) in Swedish. But LinES uses both Art and Dem, and there are also occurrences of PronType=Prs in Talbanken and the Norwegian treebanks. Question: Could we select either Dem or Art and stick to it in all cases where these words are tagged DET? For those that are currently PronType=Prs, could it be decided that they either should be PRON, or their PronType should be changed?

The PronType of the PRON instances is either Prs or Dem in Danish and Norwegian; Prs, Ind, Rel, Tot, Art (!) in Talbanken; Prs or empty in Swedish PUD; Prs, Dem, Art in LinES. Question: Could it be always Prs in Swedish, too (as in Danish and Norwegian)?

cat *.conllu | udapy util.Eval node='if node.form.lower() in ["det", "den", "dét", "dén", "de", "dem", "d."] and not node.upos in ["PROPN", "X"]: print(node.upos, node.form.lower(), node.lemma)' | sort | uniq -c | sort -rn

Lemmatization: I would have expected one lemma (probably den) for all these forms but it is definitely not the case and perhaps it is also not desired. What is always normalized is the accented version (dén vs. den, dét vs. det). Case is also normalized for the 3PL pronoun de "they" (nominative) vs. dem "them" (accusative); other forms do not seem to distinguish case. Gender and number sometimes is and sometimes is not normalized. So most Danish plural pronouns de are lemmatized to the plural form, but some of them have the singular lemma den. The neuter singular det is usually lemmatized as det but sometimes as the common gender form den (while it is never normalized from den to det). Danish also keeps a separate lemma De for the polite 2nd person address (taken from third person plural but capitalized). The two Norwegian treebanks mostly keep separate lemmas for the two singular genders and for the plural, but there are occasional outliers that break this rule (14 instances of den lemmatized as det in Bokmaal). Swedish LinES has mostly den as the lemma; the exception is plural de when tagged PRON (and not DET), which has lemma de in 507 cases (352 nominative de, 155 accusative dem) and only in 3 cases it is lemmatized as den. Talbanken has a mixture of approaches; normalizing gender (det to den) seems to be the norm, although not kept 100%, plural stays separate, and in addition some of the words (both DET and PRON!) are lemmatized to the indefinite article en. PUD has only 2 occurrences of en as lemma, otherwise determiners (both singular and plural) are lemmatized mostly to den, pronouns to their own gender/number (den to den, det to det, de and dem to de). Question: Is there any chance we could get closer at least to the approach taken in LinES?

Features other than PronType and Definite: The Number feature seems to be used everywhere (den, det are Sing, de, dem are Plur) except for Nynorsk, which does not annotate singular. Gender is distinguished for the singular forms (den is Com, det is Neut), but the Norwegian treebanks ignore the Gender=Com feature and use Masc, Fem, or Fem,Masc. Question: Could Norwegian use Gender=Com for den? In Danish and Norwegian, Person=3 accompanies most of the personal pronoun instances, with occasional Person=2 for the polite addresses in Danish and Nynorsk. Swedish mostly lacks the feature, except a few instances in LinES. Question: Could Person=3 be added also in Swedish for pronouns (not for determiners)? Case is mostly used for plural pronouns to distinguish de (Nom) from dem (Acc) but Danish also has case for singular pronouns (probably incorrectly anyway; it should be removed) and LinES has it with plural determiners (probably to be removed too?)

LarsAhrenberg commented 1 year ago

@jnivre wrote: I completely agree that a common solution for Swedish, Danish and Norwegian would be highly preferable, so why don't we ping @liljao and @LarsAhrenberg and see if we can achieve this.

I agree too. I find many of Dan's suggestions too radical, however, at least for Swedish, as there are uses, in particular for the singular determiners den/det that need to be, and can be, distinguished systematically. I look forward to a joint discussion.

jnivre commented 1 year ago

@dan-zeman I think all your suggestions make good sense.

Determiner uses should have UPOS DET, PronType=Art and appropriate Gender and Number features (but not Case or Person, which are not relevant there, at least not in Swedish).

Personal pronoun uses should have UPOS PRON, PronType=Prs and appropriate Gender, Number, Case and Person features.

Lemmatization should definitely neutralise gender (bringing together "den" and "det") and case (bringing together "de" and "dem"), possibly also number for determiners (adding "de" to "den" and "det") but probably not for pronouns (keeping 3rd person singular distinct from 3rd person plural).

The tricky part is what to do with demonstratives, which overlap with both determiner and pronoun uses. Starting with the former, there is a contrast between "bilen" (the car) and "den bilen" (that car). However, when this distinction is neutralised when there is an adjectival modifier, because that triggers article doubling. Hence "den röda bilen" is ambiguous in written Swedish between an article reading ("the red car") and a demonstrative reading ("that red car"), which would be disambiguated by stress in spoken Swedish (the demonstrative reading having stress on "den").

When it comes to pronominal uses, it is customary to treat "de", "den", "de" and "dem" as demonstrative pronouns (at least) when they are followed by one of the pronominal adverbs "här" (here) and "där" (there), where the latter encodes the proximal-distal distinction (corresponding to English "this" vs. "that"). The question, however, is whether we need to follow this tradition, or whether we could say that it is only the phrase "den här/där" that has a demonstrative function and that the constituent "den" is just an ordinary personal pronoun. Finally, there is the question of whether "den/det/de" by themselves can be considered demonstratives when emphasised, or whether that can also be treated as a pragmatic function, rather than as a lexical property. Note also that there is a corresponding series of true demonstratives "denna" (cf. "den"), "detta" (cf. "det"), "dessa" (cf. "de/dem").

I look forward to hearing everyone's view on these thoughts, as well as additional information about Danish and Norwegian.

LarsAhrenberg commented 1 year ago

A proposal for Swedish on the assumption that we continue to distinguish the traditionally demonstrative forms from the other PronTypes. Thisgives four alternatives for each of the words de/den/det, in Swedish, two as DET and two as PRON. The word dem gets one description.

DET de den Definite=Def|Number=Plur|PronType=Art (de mörka nätterna ~ the dark nights) DET de den Definite=Def|Number=Plur|PronType=Dem (de nätter/na ~those nights) DET den den Definite=Def|Gender=Com|Number=Sing|PronType=Art (den mörka natten ~the dark night) DET den den Definite=Def|Gender=Com|Number=Sing|PronType=Dem (den natt/en, den här/där natten ~that night) DET det den Definite=Def|Gender=Neut|Number=Sing|PronType=Art (det mörka rummet ~the dark room) DET det den Definite=Def|Gender=Neut|Number=Sing|PronType=Dem (det här/där rummet ~this/that room) PRON de de Definite=Def|Number=Plur|PronType=Dem (de här, de där irrespective of nominal deprel ~these/those) PRON de de Case=Nom|Definite=Def|Number=Plur|Person=3|PronType=Prs (de såg oss ~they saw us) PRON dem de Case=Acc|Definite=Def|Number=Plur|Person=3|PronType=Prs (vi såg dem ~we saw them) PRON den den Definite=Def|Gender=Com|Number=Sing|PronType=Dem (den här/där irrespective of nominal deprel) PRON den den Definite=Def|Gender=Com|Number=Sing|Person=3|PronType=Prs (jag såg den, den såg mig ~ I saw it, ...) PRON det den Definite=Def|Gender=Neut|Number=Sing|Person=3|PronType=Prs (jag såg det, det regnar ~ I saw it) PRON det den Definite=Def|Gender=Neut|Number=Sing|PronType=Dem (det här/där irrespective of nominal deprel)

I keep 'de' as the lemma for the PRON de as it forms a paradigm with the possessive: de, dem, deras, while the singular forms do not separate subject and object forms. With this logic, however, we get different lemmas for dom, a form that is becoming more and more common also in written language, as DET and PRON

DET dom den Definite=Def|Number=Plur|PronType=Art (or Dem) PRON dom de Definite=Def|Number=Plur|Person=3|PronType=Prs

jnivre commented 1 year ago

Thanks, @LarsAhrenberg. This looks good to me. The fact that we get different lemmas for different uses of "dom" is perhaps a little annoying, but it is perfectly consistent with the fact that we also get different lemmas for different uses of "de". Or am I missing the point here?

The only way to avoid this would be to separate singular and plural forms also for the determiner uses. Did you consider that?

Finally, I observe that this gives up the idea of grouping definite articles together with indefinite articles by using the lemma "en" for all article uses of "den", "det" and "de". Personally, I think this is an improvement.

jnivre commented 1 year ago

Shall we wait and see what our Danish and Norwegian colleagues have to say before we make a decision?

@AngledLuffa suggested that we might want to extend the discussion to include Icelandic and Faroese as well.

LarsAhrenberg commented 1 year ago

@jnivre, I didn't actually consider having separate singular and plural forms also for the determiner uses, but it is a definite possibility.

jnivre commented 1 year ago

It seems that this is what Danish and Norwegian does (except for a few cases that could errors).

dan-zeman commented 1 year ago

@jnivre wrote: I completely agree that a common solution for Swedish, Danish and Norwegian would be highly preferable, so why don't we ping @liljao and @LarsAhrenberg and see if we can achieve this.

Pinging also @peresolb who did the most recent changes in both Norwegian treebanks.

dan-zeman commented 1 year ago

@AngledLuffa suggested that we might want to extend the discussion to include Icelandic and Faroese as well.

When creating the issue I considered adding statistics from Icelandic and Faroese too. But then I thought that things might get too complicated because they seem to have preserved more morphological variability (also in Faroese). But if a consensus among Danish, Swedish and Norwegian can be found at all, then it definitely won't hurt to see if some of the ideas can be projected to Faroese and Icelandic.

jnivre commented 1 year ago

Agreed. Let’s do it in two steps.

Skickat från Outlook för iOShttps://aka.ms/o0ukef


Från: Dan Zeman @.> Skickat: Monday, November 20, 2023 4:39:17 PM Till: UniversalDependencies/docs @.> Kopia: Joakim Nivre @.>; Mention @.> Ämne: Re: [UniversalDependencies/docs] Annotation of den/det/de in Scandinavian (Issue #992)

@AngledLuffahttps://github.com/AngledLuffa suggested that we might want to extend the discussion to include Icelandic and Faroese as well.

When creating the issue I considered adding statistics from Icelandic and Faroese too. But then I thought that things might get too complicated because they seem to have preserved more morphological variabilityhttps://ielanguages.com/icelandic-demonstratives.html (also in Faroese)https://en.wikipedia.org/wiki/Faroese_grammar#Personal_Pronouns. But if a consensus among Danish, Swedish and Norwegian can be found at all, then it definitely won't hurt to see if some of the ideas can be projected to Faroese and Icelandic.

— Reply to this email directly, view it on GitHubhttps://github.com/UniversalDependencies/docs/issues/992#issuecomment-1819305197, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABZ7ZVQMGGQKYP3W75SFJUTYFN2SLAVCNFSM6AAAAAA7ODUTJCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMJZGMYDKMJZG4. You are receiving this because you were mentioned.Message ID: @.***>

VARNING: Klicka inte på länkar och öppna inte bilagor om du inte känner igen avsändaren och vet att innehållet är säkert. CAUTION: Do not click on links or open attachments unless you recognise the sender and know the content is safe.

När du har kontakt med oss på Uppsala universitet med e-post så innebär det att vi behandlar dina personuppgifter. För att läsa mer om hur vi gör det kan du läsa här: http://www.uu.se/om-uu/dataskydd-personuppgifter/

E-mailing Uppsala University means that we will process your personal data. For more information on how this is performed, please read here: http://www.uu.se/en/about-uu/data-protection-policy

peresolb commented 1 year ago

Hi all!

Question: Could we select either Dem or Art and stick to it in all cases where these words are tagged DET? For those that are currently PronType=Prs, could it be decided that they either should be PRON, or their PronType should be changed?

I see no problem with that. We can change all DET-tagged instances to PronType=Art. There seem to be only two DET-cases with PronType=Prs in the Norwegian treebanks, and those can be changed to Art too. There is a determiner/article contrast like the one @jnivre mentions in Norwegian too. However, the contrast isn't reflected in the analyses in the Norwegian treebanks and we don't have the bandwidth to try to introduce it, so I think we will stick to PronType=Art consistently.

Lemmatization: I would have expected one lemma (probably den) for all these forms but it is definitely not the case and perhaps it is also not desired. ... Swedish LinES has mostly den as the lemma; the exception is plural de when tagged PRON (and not DET), which has lemma de in 507 cases (352 nominative de, 155 accusative dem) and only in 3 cases it is lemmatized as den. Question: Is there any chance we could get closer at least to the approach taken in LinES?

I am fine with setting "den" as lemma for all DET uses and "de" and "den" for the PRON uses, as @LarsAhrenberg suggests.

Question: Could Norwegian use Gender=Com for den?

We don't seem to use Gender=Com at all in Norwegian for the time being, but we could replace all Gender=Fem,Mask with Gender=Com.

jnivre commented 1 year ago

Thanks, @peresolb. It seems that we are converging on using separate lemmas for singular and plural for pronoun, but to use a single lemma for the determiner uses. Concerning demonstratives, it seems that Swedish LinES is the only treebank that really makes a distinction between demonstrative and non-demonstrative uses, while the other treebanks (with the possible exception of Danish DDT) uses either Dem or Art but not both. If this is the case, then I agree that it would probably require too much manual work to add this distinction to the annotations.

LarsAhrenberg commented 1 year ago

My interpretation for Swedish, then, is as follows, with a single description for each token, modulo the part-of-speech:

DET de de Definite=Def|Number=Plur|PronType=Art (de mörka/här nätterna ~ the dark nights, these nights) DET dom de Definite=Def|Number=Plur|PronType=Art (dom mörka/här nätterna ~ the dark nights, these nights) DET den den Definite=Def|Gender=Com|Number=Sing|PronType=Art (den mörka/här natten ~the dark night, this night) DET det den Definite=Def|Gender=Neut|Number=Sing|PronType=Art (det mörka/här rummet ~the dark room, this room)

PRON de de Case=Nom|Definite=Def|Number=Plur|Person=3|PronType=Prs (de såg oss, de här ~they saw us, these (guys)) PRON dem de Case=Acc|Definite=Def|Number=Plur|Person=3|PronType=Prs (vi såg dem ~we saw them) PRON dom de Definite=Def|Number=Plur|Person=3|PronType=Prs (vi såg dom, dom såg oss ~we saw them, they saw us) PRON den den Definite=Def|Gender=Com|Number=Sing|Person=3|PronType=Prs (jag såg den, den såg mig ~ I saw it, ...) PRON det den Definite=Def|Gender=Neut|Number=Sing|Person=3|PronType=Prs (jag såg det, det regnar ~ I saw it)

I assume, though, that PronType=Dem will still be used for the words denna, detta, dessa.

dan-zeman commented 1 year ago

Do you think that Definite=Def is useful/needed with the PRON tag? Or is it because you regularly have it also on all nouns? (And does it mean that you would have it on other pronouns, too?)

LarsAhrenberg commented 1 year ago

I believe it has been there both in Swedish_LinES and Talbanken right since they were created. And it is used on nouns as well as pronouns. For instance, indefinite pronouns such as 'man' (one) and 'någonting' (something) are both Definite=Ind and PronType=Ind. This may be regarded as an unnecessary duplication, but on the other hand I don't see what harms it may cause.

jnivre commented 1 year ago

Thanks @LarsAhrenberg. I assume this means that you are okay with losing the distinction between PronType=Art and PronType=Dem for DET and between PronType=Prs and PronType=Dem for PRON. I completely agree that PronType=Dem should be retained for "denna", etc.

I am still not sure what I think about having different lemmas for the singular and plural articles, but it does simplify things since "de", "dem" and "dom" will always be lemmatised "de", regardless of part-of-speech tag. What do others think?

KennethEnevoldsen commented 11 months ago

I seem to have missed out on this conversation, and that it has died out despite the agreement. I don't mind doing the work for Danish - It seems like what needs to be done is:

Anything I missed out on?

jnivre commented 11 months ago

I don't think we have converged completely yet, but the current proposal is to mainly use PronType=Prs with PRON and PronType=Art (not PronType=Dem) with DET, since the distinction between PronType=Art and PronType=Dem is hard do make in written text. I assume Swedish and Danish are similar enough to use the same analysis here.

KennethEnevoldsen commented 11 months ago

Thanks for the clarification.@jnivre I have corrected the above comment to Prontype=Art. The hope with the comment was to take the last steps toward reaching a consensus.

jnivre commented 11 months ago

Thanks, @KennethEnevoldsen. I do think we have a coherent proposal now. So, unless anyone has additional thoughts, I think we should just go ahead and implement it in our various treebanks.

jnivre commented 11 months ago

Here is a summary of the consensus as I understand it for "den", "det", "de", "dem" (and variants):

@LarsAhrenberg @peresolb @KennethEnevoldsen @dan-zeman If everyone agrees, we can close this issue and start implementing this in all our treebanks.

LarsAhrenberg commented 11 months ago

I agree.

dan-zeman commented 11 months ago

Fine with me. Thanks for sorting this out!

jnivre commented 10 months ago

Closed after reaching consensus. Everyone will do their best to implement this in their respective treebanks before the next release.