UniversalDependencies / docs

Universal Dependencies online documentation
http://universaldependencies.org/
Apache License 2.0
272 stars 247 forks source link

POS for English infinitive 'to' #91

Closed manning closed 10 years ago

manning commented 10 years ago

While working out USD, @jnivre got us to change the dependency of infinitive 'to' from our traditional aux to mark:

One of the things I find problematic in the existing Stanford dependencies is the choice to annotate the infinitive marker “to” as “aux”. I am well aware of the tradition of doing this for English (starting, if I am not mistaken, with the GPSG analysis of verb phrases), but for most other languages this is just weird. The infinitive marker may historically be a preposition (as in English and French) or a subordinating conjunction (as in Swedish “att”), but regardless of the origin it seems to be a grammaticalized function that is well attested cross-linguistically and that is much more similar to subordinating conjunctions than to auxiliary verbs. I would much prefer to use the “mark” relation for this.

But what of its POS. Does this mean that 'to' should be SCONJ. Or do we say that it is ADP for English-like languages and SCONJ in Swedish?

@sebschu is helping me do an English tag converter....

dan-zeman commented 10 years ago

But what of its POS. Does this mean that 'to' should be SCONJ. Or do we say that it is ADP for English-like languages and SCONJ in Swedish?

This sounds reasonable to me. Similar to saying that adpositions used as verbal particles are still ADP.

yoavg commented 10 years ago

I'm good with ADP, but why not PART? this may have a chance of being more consitent across languages.

On Tuesday, October 7, 2014, Dan Zeman notifications@github.com wrote:

But what of its POS. Does this mean that 'to' should be SCONJ. Or do we say that it is ADP for English-like languages and SCONJ in Swedish?

This sounds reasonable to me. Similar to saying that adpositions used as verbal particles are still ADP.

— Reply to this email directly or view it on GitHub https://github.com/UniversalDependencies/docs/issues/91#issuecomment-58243370 .

manning commented 10 years ago

Any more comments from various people like @mcdm and @slavpetrov?

On reflection, I think that @yoavg might be right here.

Thinking mainly of English,

dan-zeman commented 10 years ago

PART sounds even more reasonable to me :-) Strange that I didn't think about it yesterday.

jnivre commented 10 years ago

Yeah, perhaps PART is the best solution, although I have argued previously that it should be ADP in some languages and SCONJ in others. In languages like French and Spanish, the words used to perform this function are definitely prepositions ("de" and perhaps "à" in French), and I think this is true of English as well (at least historically). But in Swedish it is equally clear that we have recruited a subordinating conjunction ("att", corresponding to English "that", French "que", etc.) for this purpose. In analogy with our analysis of verb particles, I therefore thought these should be treated as ADP in romance languages but as SCONJ in Swedish. Treating them as PART in all (these) languages will of course increase the parallelism, but the question is whether this is a case of pushing the parallelism too far. The fact that they perform the same function in all languages is captured by the fact that they all have the syntactic relation "mark", and it is not clear that we should enforce this on the postag level as well.

mcdm commented 10 years ago

Yes, I also agree that PART is the best solution for English. @jnivre: I don't think people are suggesting to use the same POS across languages. But if they are, I agree with you Joakim that it would be wrong to tag as PART the words performing the same function in other languages if such words are clearly prepositions or subordinating conjunctions. The "mark" dependency will indeed capture what we want.

manning commented 10 years ago

Okay, PART for English, and other languages can do different things. (I won't comment on Swedish, but I think the French case is somewhat different in that there is an infinitive verb form, and "de" is much more a preposition.)