POS for English infinitive 'to'

manning commented 10 years ago

While working out USD, @jnivre got us to change the dependency of infinitive 'to' from our traditional aux to mark:

infinitive marker:

One of the things I find problematic in the existing Stanford dependencies is the choice to annotate the infinitive marker “to” as “aux”. I am well aware of the tradition of doing this for English (starting, if I am not mistaken, with the GPSG analysis of verb phrases), but for most other languages this is just weird. The infinitive marker may historically be a preposition (as in English and French) or a subordinating conjunction (as in Swedish “att”), but regardless of the origin it seems to be a grammaticalized function that is well attested cross-linguistically and that is much more similar to subordinating conjunctions than to auxiliary verbs. I would much prefer to use the “mark” relation for this.

But what of its POS. Does this mean that 'to' should be SCONJ. Or do we say that it is ADP for English-like languages and SCONJ in Swedish?

@sebschu is helping me do an English tag converter....

dan-zeman commented 10 years ago

But what of its POS. Does this mean that 'to' should be SCONJ. Or do we say that it is ADP for English-like languages and SCONJ in Swedish?

This sounds reasonable to me. Similar to saying that adpositions used as verbal particles are still ADP.

yoavg commented 10 years ago

I'm good with ADP, but why not PART? this may have a chance of being more consitent across languages.

On Tuesday, October 7, 2014, Dan Zeman notifications@github.com wrote:

But what of its POS. Does this mean that 'to' should be SCONJ. Or do we say that it is ADP for English-like languages and SCONJ in Swedish?

This sounds reasonable to me. Similar to saying that adpositions used as verbal particles are still ADP.

— Reply to this email directly or view it on GitHub https://github.com/UniversalDependencies/docs/issues/91#issuecomment-58243370 .

manning commented 10 years ago

Any more comments from various people like @mcdm and @slavpetrov?

On reflection, I think that @yoavg might be right here.

Thinking mainly of English,

ADP seems kind of weird/unfortunate. I've always felt that it was a mistake that the Penn Treebank originally used the same tag TO for both the infinitive marker and the preposition to -- and indeed, they've changed things to stop doing that in recent treebanks. If we treated the infinitive to as ADP, we'd be reproducing their mistake. Also, we use ADP for the particle in English phrasal verbs, such as make up, and it would seem a little funny to use ADP for these two very different things in verb groups.
AUX seems to make no sense, since that would only be consistent with calling the dependency aux, which we've given up (and I think on balance that was the right decision)
SCONJ seems weird, because it doesn't seem to have anything to do with subordinating conjunctions in behavior (even though they are both (usually) related to prepositions in English).
So, that leaves PART as looking pretty good to me. It wouldn't be cross-linguistically unusual, since it would parallel things like a Chinese perfective aspect particle le which is PART, I assume.

dan-zeman commented 10 years ago

PART sounds even more reasonable to me :-) Strange that I didn't think about it yesterday.

jnivre commented 10 years ago

Yeah, perhaps PART is the best solution, although I have argued previously that it should be ADP in some languages and SCONJ in others. In languages like French and Spanish, the words used to perform this function are definitely prepositions ("de" and perhaps "à" in French), and I think this is true of English as well (at least historically). But in Swedish it is equally clear that we have recruited a subordinating conjunction ("att", corresponding to English "that", French "que", etc.) for this purpose. In analogy with our analysis of verb particles, I therefore thought these should be treated as ADP in romance languages but as SCONJ in Swedish. Treating them as PART in all (these) languages will of course increase the parallelism, but the question is whether this is a case of pushing the parallelism too far. The fact that they perform the same function in all languages is captured by the fact that they all have the syntactic relation "mark", and it is not clear that we should enforce this on the postag level as well.

mcdm commented 10 years ago

Yes, I also agree that PART is the best solution for English. @jnivre: I don't think people are suggesting to use the same POS across languages. But if they are, I agree with you Joakim that it would be wrong to tag as PART the words performing the same function in other languages if such words are clearly prepositions or subordinating conjunctions. The "mark" dependency will indeed capture what we want.

manning commented 10 years ago

Okay, PART for English, and other languages can do different things. (I won't comment on Swedish, but I think the French case is somewhat different in that there is an infinitive verb form, and "de" is much more a preposition.)

UniversalDependencies / docs

POS for English infinitive 'to' #91