UniversalDependencies / docs

Universal Dependencies online documentation
http://universaldependencies.org/
Apache License 2.0
272 stars 247 forks source link

Relative clause markers #223

Closed liljao closed 8 years ago

liljao commented 9 years ago

The guidelines state that relative clause introducing words like English "that" should be tagged as SCONJ. I have two questions:

  1. I noticed that in the English data these are tagged as determiners. Is this an inconsistency in the guidelines or possibly in the English data?
  2. What are the consequences of choosing an SCONJ analysis for a relativizer? Would this element have an argument relation to the head of the relative clause or would it simply be a mark? In Norwegian there are good reasons for choosing to treat the relativiser as a SCONJ (it does not inflect, like who-whom or der-die-das and only occurs initially in a relative clause) so I would be curious to hear your thoughts on the preferred analysis.
dan-zeman commented 9 years ago

I would say that there is ambiguity between two (three?) different "that"s in English:

Just my two cents and intuition; I did not study the English documentation nor data now.

dan-zeman commented 9 years ago

PS I do not think that inflection (like who-whom) is a necessary condition for a word to be a pronoun and to act as an argument of a verb.

liljao commented 9 years ago

I see that the universal guidelines for SCONJ contains a reference to Loos et al (2003) who distinguish between relativizers (like English 'that') and relative pronouns (like English 'who'). I assume that a SCONJ analysis entails a mark relation for the relativizer then, please correct me if I am wrong.

@dan-zeman I agree that inflection is not a necessary condition for pronominal status, however when distributional evidence also points in that direction ('som' does not occur in other nominal contexts, it only introduces relative clauses) I do not see a good reason to treat it as a pronoun.

dan-zeman commented 9 years ago

Sorry, I am afraid I would need some knowledge of Norwegian to be able to comment on som. However, you may want to look at the Swedish and Danish data, and to discuss this with @jnivre , @bplank and others. As these languages are much closer to Norwegian, I suppose that you guys want to have the same solution in all three.

liljao commented 9 years ago

Yes, that is probably a good idea :-)

However, I would still be curious about the universal guidelines for relativizers vs. relative pronouns. Should the choice of SCONJ as opposed to PRON/DET have syntactic consequences as well (mark vs nsubj/dobj etc.)?

dan-zeman commented 9 years ago

Yes, if it is SCONJ then it should be mark and not nsubj/dobj/nmod.

jnivre commented 9 years ago

Sorry for not being up to date, but are we talking about universal guidelines or language-specific English guidelines here? I would argue for treating "som" as a relative pronoun in the scandinavian languages, regardless of the theoretical linguistic discussion, because we lose too much information otherwise when we don't use empty elements. I would argue the same for English "that", which is why I want to know who I should harass about this. :)

daghaug commented 9 years ago

I was just about to write that you'd have to work really hard to find a definition of pronouns that would extend naturally to "som" in Scandinavian, but I agree with Joakim that preserving information on the grammatical relation being relativized on trumps the desire for a coherent definition.

Dag

On 10/26/2015 12:29 PM, Joakim Nivre wrote:

Sorry for not being up to date, but are we talking about universal guidelines or language-specific English guidelines here? I would argue for treating "som" as a relative pronoun in the scandinavian languages, regardless of the theoretical linguistic discussion, because we lose too much information otherwise when we don't use empty elements. I would argue the same for English "that", which is why I want to know who I should harass about this. :)

— Reply to this email directly or view it on GitHub https://github.com/UniversalDependencies/docs/issues/223#issuecomment-151105715.

liljao commented 9 years ago

The universal guidelines state that the SCONJ category should encompass relativizers (as opposed to relative pronouns), with the example of English that, so I guess both?

jnivre commented 9 years ago

Thanks, Lilja. I think we need to work on this for version 2. :) I think the important point is that we want to preserve the information about the relation being relativized. Whether that is compatible with tagging the word as SCONJ is something that we can discuss, but for the time being I would use PRON instead (as we do in UD_Swedish). In a way, this is what UD_English does as well, despite what the guidelines say. It is just that the Penn Treebank insist on calling these pronouns DET instead of PRON.

manning commented 8 years ago

Sorry - I'm even less able to keep up with things than @jnivre .... I think we can work on this even for version 1.3, since it's more at the level of "clarification" than fundamentally changing the system!

TL;DR: I'll try to improve the documentation; the current English POS but not dependencies are buggy.

Linguistically, I think it is clear that sometimes relative clauses have at their left edge a relative pronoun (or, more generally, a relative phrase) and sometimes they have a complementizer. Assigning a category in practice is sometimes difficult because of grammaticalization and change, but there are arguments for one analysis or another. So, linguistically, I think we should say that an annotator can use SCONJ (which is mark) or PRON (which is nsubj/dobj, etc. as appropriate) and people should decide what is right for their language.

FWIW: This is something that is discussed and debated quite a bit linguistically, including recently. Richie Kayne is famous both for early on arguing that French que should be treated as a complementizer and for recently arguing that relative clause initial putative complementizer and relatives pronouns should be treated the same. See, e.g., this article: http://ling.auf.net/lingbuzz/001539 .)

@jnivre offers the practical alternative that we should always use PRON so that the role that is relativized on can be tracked. This would be an alternative or addition to the idea of using an extra dependency in the enhanced representation to track what is relativized on. I can see the appeal of this as a practical move. I guess we should discuss this further and decide one way or the other. Basically I'm in agreement with @daghaug: Linguistically wrong, but may be the practical thing to do.

Now, specifically for English, and mentions of it in the universal guidelines. There is an argument from Middle English that that should be analyzed as a complementizer while who and which are relative pronouns, made, for example by Cynthia Allen (following Jespersen's lead). At that stage of English there are good arguments for this, including that you can get both:

Every word which that she of hire herde (Troilus and Criseyde, II.899) A mirror…in which that ye me see youre face a morewe. (Troilus and Criseyde, II.404)

I think the situation here is similar to the reasons for arguing that som in Norwegian is SCONJ (though I'm not an expert).

Now one could try to maintain that this is still the situation in modern English, but I think there is no positive evidence for this analysis, and since that is also used as a determiner and pronoun in modern English, it seems the economical analysis to call it a PRON. So I will delete from the docs the claim that it is an SCONJ - while leaving the possibility of SCONJ relative clause complementizers until this is decided.

Then for the actual English data: While we have hand-annotated the dependencies, our POS are an automatic conversion of LDC Penn Treebank POS tags. In that tag set, all uses of that as a relative introducer are tagged WDT, just as all uses of that as a pronoun or determiner are tagged DT. We made our converter better for version 1.2 - so instead of all things tagged DT being rendered DET, we now use syntactic context to decide between DET and PRON. But I guess @sebschu missed the case of also mapping the WDT that cases to PRON. Indeed, it seems here that all cases of WDT that should be mapped, since while you can get DET cases with which, that isn't possible with that. So, the DET is wrong, don't copy that! The dependencies are correct, however (really treating it as PRON).

jnivre commented 8 years ago

I agree with @manning. For languages like Swedish and English, treating that/att as a relative pronoun is an extremely practical solution as long as we haven't worked out detailed guidelines for the enhanced representation, and it is not obviously wrong from a linguistic point of view either despite alternative proposals. At the same time, we still want to allow other languages to treat the words introducing relative clauses as pure complementisers, when there is no evidence of relative pronouns in the language at all. As far as I understand, this is what is currently being done for Persian, for example. Finally, this will definitely be one of the issues that we need to look into when developing v2 of the guidelines, which will hopefully also include guidelines for the enhanced version.