UniversalDependencies / docs

Universal Dependencies online documentation
http://universaldependencies.org/
Apache License 2.0
273 stars 248 forks source link

How to differentiate DET for quantifiers and DET for demonstrative determiners for isolating languages like Thai #1048

Closed leky40 closed 2 months ago

leky40 commented 3 months ago

I was wondering if there would be a syntactic relation to differentiate a quantifier tagged DET and a demonstrative determiner tagged DET for isolating languages without any agreements and/or grammatical markers, like Thai. I was trying to annotate a construction / structure: head noun + quantifier + noun (used as a classifier) + demonstrative determiner.

Please look at a treebank I attach here.

According to the UD framework, in the treebank, I tagged the quantifier (meaning "several) DET and the demonstrative determiner (meaning "this") DET. When I annotated them with the syntactic relations for both, they must be "det".

But is there any way to distinguish these two syntactic relations for these two words?

I checked the UD relations. There are two: det:numgov and det:nummod. After reading their definitions, I am not sure if they would be fit for Thai. Only Czech samples are presented.

Or should this Thai demonstrative determiner not be tagged DET?

Thai is an isolating and tone language without grammatical markers.

Would there be any suggestions for my questions? And does my annotation seem possible or to make sense?

samples for classifier doc txt(2)

ftyers commented 3 months ago

There generally wouldn't be a different syntactic relation, but you could use a language specific one det:quant vs. det. The other thing you could do is use the PronType=Dem for the demonstrative sense, or propose a PronType=Qnt lexico-morphological feature.

leky40 commented 3 months ago

There generally wouldn't be a different syntactic relation, but you could use a language specific one det:quant vs. det. The other thing you could do is use the PronType=Dem for the demonstrative sense, or propose a PronType=Qnt lexico-morphological feature.

Ok thank you

Stormur commented 3 months ago

The syntactic relation is and has to stay the same. Some possible distinguishing traits like the position in the phrase are already represented in the tree structure.

Since these are lexical/semantic differences, the way to mark them is through PronType, as already suggested by ftyers, and I would also point to NumType when dealing with quantities. The point is that if a determiner can answer to a quantity question as a numeral can (How many? Several/Three), this feature makes sense. The difference with numerals will be having a PronType and/or not having a specific value (NumValue).

I am skeptical about putting semantic subtypes referring only to single elements in the dependency relations, as opposed to other subtypes referring to whole constructions and clauses (e.g. pass, cmp, even numgov...), so personally I would avoid that. Features like PronType make these elements already retrievable.

leky40 commented 3 months ago

@Stormur thank you