UniversalDependencies / docs

Universal Dependencies online documentation
http://universaldependencies.org/
Apache License 2.0
267 stars 245 forks source link

Which upos should be assigned to 'not'? #995

Closed heeringa0 closed 2 months ago

heeringa0 commented 8 months ago

At: https://universaldependencies.org/u/pos/PART.html it is written:

_Particles are function words that must be associated with another word or phrase to impart meaning and that do not satisfy definitions of other universal parts of speech (e.g. adpositions, coordinating conjunctions, subordinating conjunctions or auxiliary verbs). Particles may encode grammatical categories such as negation, mood, tense etc. Particles are normally not inflected, although exceptions may occur.

In general, the PART tag should be used restrictively and only when no other tag is possible. The language-specific documentation should list the words classified as PART in the given language.

Examples

However, in any case across the Germanic languages there is not much consistency. In some treebanks 'not' is coded as 'ADV' which seems to me logical, and in other treebanks 'not' is coded as a 'PART' in agreement with the UD guidelines (if I interpret them well). Overview:

<!DOCTYPE html>

language | word | version | upos -- | -- | -- | -- Afrikaans | nie nie | AfriBooms | ADV PART Danish | ikke | DDT | ADV Dutch | niet | Alpino | ADV Dutch | niet | LassySmall | ADV English | not | atis | ADV English | not | ESLSpok | PART English | not | EWT | PART English | not | GUM | PART English | not | LinES | PART English | not | ParTUT | PART Faroese | ikki | FarPaHC | ADV Frisian | net | FAME | ADV German | nicht | GSD | PART German | nicht | HDT | PART Gothic | ni | PROIEL | ADV Icelandic | ekki | GC | ADV Icelandic | ekki | IcePaHC | ADV Icelandic | ekki | Modern | ADV Icelandic | ekki | PUD | ADV Low Saxon | neet / nich | LSDC | PART Norwegian | ikke | Bokmaal | PART Norwegian | ikke | Nynorsk | PART Swiss German | nöd | UZH | PART Swedish | inte | LinES | PART Swedish | inte | Talbanken | PART

At the following sites 'not' is considered a particle: https://en.wikipedia.org/wiki/Grammatical_particle https://glossary.sil.org/term/particle https://langeek.co/en/grammar/course/271/particles

On the UD site there is a reference to Loos, Eugene E., et al. 2003. Glossary of linguistic terms: What is a particle? , but the link is broken.

My questions is simply: should 'not' be tagged as PART of as ADV, or does this not matter?

dan-zeman commented 8 months ago

As it is the only word listed explicitly in the universal guidelines of PART, it should be PART :-) (but the deprel is advmod anyway).

heeringa0 commented 8 months ago

'not' is not the only word ...

dan-zeman commented 8 months ago

'not' is not the only word ...

True, sorry. But it is one of the few examples that are given, while other instances of PART in individual languages are delimited "negatively" as something that does not fit elsewhere.

heeringa0 commented 8 months ago

I see, thanks for the clarification.

sylvainkahane commented 8 months ago

@dan-zeman You mean in English? I think I make sense to consider not as PART of English because it has a particular syntactic behavior (position, portmanteau with modals) which distinguish it from ADVs. I am not sure it is so motivated in other Germanic languages to consider the negation as a PART. An there are certainly languages where the negation clearly belongs to an open POS. I think we should clarify if the PART pos is purely semantic or if it must be syntactically motivated.

dan-zeman commented 8 months ago

I think it is not purely semantic because other negative words such as the negative determiner no and the negative pronoun nobody are not PART even in English. It is also not possible to say for all languages how do we know a word is still equivalent enough of the English not - I think some languages have negative copulas or negative auxiliaries and they are probably different. I believe there are restrictions on the position of nicht in German, so it would probably also qualify as something special, but I know nothing about the other Germanic languages. Yet I hesitate to say that PART must be motivated in a particular way (because it is meant to cover various things in various languages we don't know until we see them).

leky40 commented 8 months ago

An there are certainly languages where the negation clearly belongs to an open POS. I think we should clarify if the PART pos is purely semantic or if it must be syntactically motivated.

From this @sylvainkahane mentioning here, I am getting more curious and hesitant about my annotations for the words used to express negation towards another word in Thai. I have had some difficulty to tag such words because I am not quite sure if PART should be the right one for such words expressing negation. I just follow the UD guideline, and have taken the English ones as the examples for the Thai words.

But some of those words in Thai are categorised as a verb and/or modifier. I know Germanic languages are being discussed here, but I was wondering:

Should PART be the right tag for those words for Thai despite they are a verb or modifier? Should I tag them PART because they express negation to another word despite they are a verb / modifier?

In Thai, those words do not change the form of another word to express negation, but are added to express negative meaning towards the word or sentence.

amir-zeldes commented 8 months ago

Should PART be the right tag for those words for Thai despite they are a verb or modifier?

I don't think there's a problem with a negator being a verb or other type of modifier, though if they are modifying a main sentence verb and not contributing an additional proposition, then I would probably consider them to be auxiliaries. In Coptic we have negative auxiliary verbs, and they are tagged AUX with Polarity=Neg and deprel aux:

https://github.com/UniversalDependencies/UD_Coptic-Scriptorium/blob/master/cop_scriptorium-ud-dev.conllu#L4806

leky40 commented 8 months ago

@amir-zeldes I was wondering what indicated the negator (the sample you point here) was the negative auxiliary verb. I googled Coptic, and it's agglutinative? So is there an affix or anything indicating that?

This might not be related, but when discussing a negator being a verb could be an auxiliary verb, I wonder if for Thai it would be an aux verb or just a verb used with another verb forming a serial verb. In Thai we don't have subject verb agreement and change the verb form to show tense.

rueter commented 8 months ago

Should PART be the right tag for those words for Thai despite they are a verb or modifier?

In Coptic we have negative auxiliary verbs, and they are tagged AUX with Polarity=Neg and deprel aux:

In Erzya, the auxiliary verb of negation is suppletive: In the non-past, person and tense are shown on the main verb, with only negation shown by the negative auxiliary. We tag it 'AUX' with Polarity=Neg and deprel aux:neg:

https://github.com/UniversalDependencies/UD_Erzya-JR/blob/f2b2b89009f9132372c163020221964622e6907b/myv_jr-ud-train.conllu#L191

In the past tense, and other moods where person agreement is shown we use the same tagging:

https://github.com/UniversalDependencies/UD_Erzya-JR/blob/f2b2b89009f9132372c163020221964622e6907b/myv_jr-ud-train.conllu#L171

North Saami does the same thing.

https://github.com/UniversalDependencies/UD_North_Sami-Giella/blob/68b13b12107f506b0124c7a7378c243f52aecc8f/sme_giella-ud-train.conllu#L268

Finnish and Estonian also have negative auxiliaries. Finnish shows subject agreement on the on the negative auxiliary, whereas tense and mood are indicated by the form of the main verb. The tag set is 'AUX' with Polarity=Neg and deprel aux, as in Coptic: https://github.com/UniversalDependencies/UD_Finnish-PUD/blob/564497aaeb5d8a14fc36779f8b8f142d0a80c49a/fi_pud-ud-test.conllu#L401

The Estonian negative auxiliary does not show person or tense in the indicative, but it is suppletive in at least the imperative as are many of the other Finnic languages. The tag set is in Estonian is also 'AUX' with Polarity=Neg and deprel aux:

https://github.com/UniversalDependencies/UD_Estonian-EWT/blob/0f04ce3ca1b267a5e0142eaab4fc435948ffe49f/et_ewt-ud-test.conllu#L1152

I am puzzled, however, by the fact that the UD documentation for AUX and PART overlap in reference to tense and mood marking, but that is a different matter. AUX: many languages have nonverbal TAMVE markers and these should also be tagged AUX. PART: Particles may encode grammatical categories such as negation, mood, tense etc.

gossebouma commented 8 months ago

The decision to assign ADV to Dutch 'niet' ('not') stems from the fact that the underlying treebanks assign the tag bw (Adverb) and this has simply been preserved in the conversion. There are a few cases where 'niet' is a conjunction ('niet A maar B', 'not A but B'). The Dutch descriptive grammar ANS also refers to 'niet' as a bijwoord (adverb) so I think this is the accepted terminology for Dutch. I also cannot think of syntactic tests that would differentiate between 'niet' and other adverbs.

dan-zeman commented 8 months ago

I am puzzled, however, by the fact that the UD documentation for AUX and PART overlap in reference to tense and mood marking

This was true in UD v1 but then we said that in v2 AUX will include non-verbal auxiliaries, so now these should be AUX. Apparently this sentence (which was there since October 2014) slipped attention when the guidelines were being updated for v2. Fixed now, thanks for reporting.

amir-zeldes commented 8 months ago

what indicated the negator (the sample you point here) was the negative auxiliary verb. I googled Coptic, and it's agglutinative? So is there an affix or anything indicating that?

Yes, Coptic is, broadly, an agglutinative language, but of course what UD analysis we assign depends on what we consider to be tokens, etc. This particular auxiliary can also be used as a main predicate to indicate negative existence ("there isn't X"), so it's somewhat intuitive to treat it as a type of verb-like element, although the whole category of verbs is fairly complex in Coptic.

But I think getting into the details of Coptic is not necessary here, as a more general guideline I would say that one good criterion could be that there is only a single predication happening. UD is 'lexico-centric', so content verbs are expected to map onto clauses more or less 1:1, and the example above, which translates as "a prophet is not scorned except in his own city" is probably safe to view as a single predicate. Of course we could also try to build a very literal rendition, something like "does-not-exist a prophet is scorned...", with two predications - it is ultimately a question of guidelines. But the fact that this is the regular way of negating a present tense clause with an indefinite subject in Coptic strongly argues for the analysis in which only the lexical predicate is a verb, and the "not-exist" element is just a negative auxiliary.

Another test is backwards reference, i.e. whether you could observe or imagine a pronoun like "that" referring back once to the lexical predicate and once to the AUX candidate, with distinct meaning. Here that seems impossible.

Stormur commented 7 months ago

not is a negative PART. I think there is no doubt about that, it is also explicitly stated in the guidelines. The use of ADV happens only for inertia coming from grammars which put more or less every non NOUN/ADJ/VERB element into a catchall class, which just happens to be called "adverb", because, well, there is more or less always a predicate around.

Also Polarity=Neg is important, e.g. to cover truly non PART negation strategies (where negation is not realised by a single specialised word), but I see that it is often neglected. I also feel that admod:neg should be enforced more, because it actually isn't the same as advmod.

Should PART be the right tag for those words for Thai despite they are a verb or modifier? Should I tag them PART because they express negation to another word despite they are a verb / modifier?

They might be AUX/VERB/..., but consider Polarity=Neg.