UniversalDependencies / docs

Universal Dependencies online documentation
http://universaldependencies.org/
Apache License 2.0
269 stars 245 forks source link

Are we comfortable with the guidelines for modification of function words? #991

Open nschneid opened 11 months ago

nschneid commented 11 months ago

I'm not sure I ever internalized this policy which says that certain classes of function words only allow negation as a possible kind of modifier, but other classes of function words can have a broader range of modifiers. Based on the examples, it seems "just before" should be advmod(before, just) if "before" is an SCONJ but analyzed with the advmod and case attachments as sisters if it is an ADP. Is this really a good line to draw/are treebanks actually adhering to it? EWT appears to avoid the function-word-modifying analysis for all but a couple tokens.

Another question is whether the claim that negation can modify any function word should mean that that is the default interpretation in "not every" and similar. In UniversalDependencies/UD_English-EWT#452, which concerned "not only", @amir-zeldes and I had concluded that the shallower structure was safer.

dan-zeman commented 11 months ago

There was a long and dynamic discussion about this when UD v1 was being drafted (end of September 2014). Only nine years passed and we have it back :-) In those ancient times we still used e-mail to discuss the guidelines. I'm not sure it would help to copy all the e-mails here but I find at least this contribution from Stanford (@ngiordani) interesting:


Below is the conclusion Chris and arrived at today, after discussing what has come up in this thread, as well as additional English Web Treebank (EWT) data that we've talked about within our group.

Here are two crucial examples that I think represent the class of constructions we're focused on:

right on time two hours after the concert

Historically: in the EWT annotation, we made right and hours dependents of the respective prepositions. However, as Joakim pointed out elsewhere, in dependency syntax there is always an ambiguity between head modification and phrase modification, and the annotation we produced is ambiguous in that respect. While I agree that there's an intuition that right modifies on, it seems perfectly plausible to say that it modifies on time; note that it's also possible to say

right then two hours later

to extend the argument that Joakim has made before. (It's interesting that this works with right, which can't modify other time adverbs.) I also could not come up with a single diagnosis that would distinguish modifying the preposition in these cases from modifying the prepositional phrase. (If anyone has an idea, please share!) So there doesn't seem to be linguistic evidence (in English at least) for this P-attachment analysis. Additionally, allowing prepositions to take dependents hurts the parallel we're trying to draw with case markers. And finally, this is going to create a problem (in fact, it already creates a problem) for the collapsed representation, which a lot of people use. In that representation, any modifiers of a preposition will have to be moved to depend on its complement anyway. For these reasons, both Chris and I feel like case-typed prepositions should not have adverbial modifiers, and modifiers such as right and two hours in the example above should attach to the nominal head, representing phrase-level modification. This is consistent with attachment decisions in the rest of the scheme.

HOWEVER, we think there's a class of examples that should treated differently. Consider:

two hours after they left

In cases like this, we're worried about usability; attaching the adverb to the verbal head would be an analysis that's very difficult to interpret. Again, it's difficult to argue P^0-attachment vs. PP-attachment. But the problem of keeping this parallel to case markers isn't in issue, because in English we would annotate this after as mark, not case (since it takes a verbal complement). So basically we'd like to allow mark to have adverbial dependents, but not case.

A unified treatment would of course be desirable, but at the end of the day, it might not even be possible. It's very difficult to propose an analysis in which English prepositions, which can take verbal complements, also share properties with case markers from other languages. We think this solution is a good compromise.

Natalia

nschneid commented 11 months ago

two hours after they left

In cases like this, we're worried about usability; attaching the adverb to the verbal head would be an analysis that's very difficult to interpret.

obl:npmod:outer? *ducks*

Here are the modified SCONJ cases.

Honestly, it seems like trying to have it both ways—if marks are dependents just like cases, it's odd to say the former can take modifiers but the latter can't because it is a "pure" function word (which the guidelines admit cannot be defined universally in terms of UD categories).

In English, it is hard to draw a sharp distinction between ADP and SCONJ, but we do so in UD based on the function of case vs. mark, which is based on the category of the head. But parallels like "two hours after NP" / "two hours after VP" show how similar they are. It seems like this guideline is imposing yet another awkward structural distinction (and one that is too rare for most annotators to learn as a special case).

I wonder if there is a universal claim to be made, which is that "true" case markers (such as clitics attaching as case) don't allow these types of modifiers, while many adpositions (which language-internally are often regarded as heads) and subordinators do? If so, would it be better to distinguish these based on UPOS? I.e. a PART attaching as case or mark would not allow the modifier while an ADP or SCONJ would. There is also, e.g., VERB as case ("Especially given the current situation, ..."). However, we may want to treat focusing modifiers as a separate category that always attaches to the head of the phrase ("Only a hero can save the day", "Only after a year of training will you be ready", "Only by training every day will you learn the skills", "Only training every day will prepare you").

jnivre commented 11 months ago

The strong similarities between ADP and SCONJ may be peculiar for English -- it does not hold for Swedish, for example, which is otherwise similar to English in many respects -- but I nevertheless agree that it is awkward to treat "case" and "mark" differently in this respect. If we are going to change this, I would definitely prefer to ban dependents of "mark". Trying to draw a distinction between two types of adpositions is likely to open a big can of worms.

amir-zeldes commented 11 months ago

I don't know that I would need to see dependents of mark banned cross-linguistically, there are many languages out there and I don't think we have thought it through. But for English, I don't see any substantial difference between "two hours after the concert" and "two hours after they left". I would attach "two hours" to the lexical head, not to "after", in both cases, following UD's general lexicocentric framework. Looking at GUM, it looks like this is already the case.

sylvainkahane commented 11 months ago

I won't discuss the analysis in UD. My concern is the conversion UD => SUD. In the surface-syntactic analysis of these examples, "after" has two dependents: see the analysis in a native SUD French treebank (I am not sure that it is a good idea to treat the phrase before 'after' as a modifier, but it is our current analysis) and the SUD conversion of a UD native English treebank. The question also concerns requests in UD: how we can easily get these interesting constructions in English and see whether they exist in other languages?