UniversalDependencies / docs

Universal Dependencies online documentation
http://universaldependencies.org/
Apache License 2.0
267 stars 245 forks source link

'mark/case' not expected to have children #618

Open LarsAhrenberg opened 5 years ago

LarsAhrenberg commented 5 years ago

The updated validation script, validate.py, objects when a word with a 'mark'-relation has a dependent. But some temporal and causative subordinators in Scandinavian languages and English do have dependents as in

(långt, strax, alldeles, just) innan hon flyttade till England...
(long, shortly, just) before she moved to England...

The analysis I want is:

mark(before, moved)
advmod(long, before)

A possible solution is to treat expressions such as 'long before' as fixed. The alternative to attach the intial adverb to the head of the subordinate clause is worse, as it doesn't show that the adverb and the subordinator form a phrase. Shall I use the analysis I prefer and let my treebank not validate, change it to something I regard as wrong, or could this particular restriction be lifted?

dan-zeman commented 5 years ago

The position taken in the UD guidelines is that the adverb modifies the entire phrase, not just the preposition/conjunction. It is described in the overview of Syntax (see especially the example right before midnight; it uses case instead of mark because it is just a nominal, but otherwise it seems to be quite the same thing as the one you describe).

dan-zeman commented 5 years ago

EDIT> On the second glance, I just realized that the same section I referenced admits that light adverbials can modify subordinating conjunctions. I overlooked this when writing the validation rules—I always thought that case and mark are parallel in behavior and rules that affect them. So, should we lift the restriction on mark in the validator? Or should we create language-specific lists of light adverbials that will be allowed, and the rest banned?

amir-zeldes commented 5 years ago

To be honest I would prefer to allow mark to have modifiers, for the same reason @LarsAhrenberg mentions: they form a phrase, and the dependency structure allows you to explicitly recover that phrase. This is different from CP modifying adverbs that can be placed elsewhere in the clause:

Before they arrived entirely, we already knew.
* Before they arrived long, we already knew.
coltekin commented 5 years ago

As noted on the issue UniversalDependencies/tools#37 this applies to case as well. This is not to say that they are the heads (of the whole phrase/construction), but they are sometimes modified by other words.

My original request was due to some Turkish examples which, to me, are clear cases of modification of adpositions, but I also do not think the discussion with English example (right before midnight) I was referred to earlier is convincing.

In this particular example the word/phrase right modifies may be ambiguous, or may even be the whole phrase, but consider slightly before midnight. I think this is clearly [[slightly before] midnight], and as in the discussion one cannot replace before midnight with then (??slightly then). And another evidence (from coordination): [[[slightly before] or [right after]] midnight].

This also prevents us making the distinction between examples like, (1) slightly after adjusting the clock and (2) after slightly adjusting the clock.

LarsAhrenberg commented 5 years ago

I would support @coltekin in loosening the restriction on case as well as mark. I don't think it would be sufficient to have a list of light adverbs. Words such as before and after may also be modified by temporal noun phrases such as one week or two hours. However, the possible dependency relations are few and may be listed. I guess advmod and nmod would suffice for Swedish.

jnivre commented 5 years ago

I think a restricted list of relations (rather than lexical items) might be the best solution. This list should definitely involve "advmod" and "conj" (which both occur in the guidelines). If we also want to allow nominal modifiers in this position, I think we should use "obl", rather than "nmod", although it seems like a bit of a stretch.

liljao commented 5 years ago

We have exactly the same phenomenon in the Norwegian treebanks and I support the above suggestions to loosen the restrictions on both "mark" and "case". A "fixed" analysis for these examples is really not well founded given the lexical diversity of the possible modifiers (e.g. a week before, seconds before, etc.).

Lilja

man. 8. apr. 2019 kl. 20:20 skrev Joakim Nivre notifications@github.com:

I think a restricted list of relations (rather than lexical items) might be the best solution. This list should definitely involve "advmod" and "conj" (which both occur in the guidelines). If we also want to allow nominal modifiers in this position, I think we should use "obl", rather than "nmod", although it seems like a bit of a stretch.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/UniversalDependencies/docs/issues/618#issuecomment-480946963, or mute the thread https://github.com/notifications/unsubscribe-auth/AAkGPKU6-WKtiiTKBLlcVuRf-C424AHvks5ve4hmgaJpZM4ciIUm .

--

Lilja Øvrelid, PhD Language Technology Group (LTG), Dept. of Informatics University of Oslo

dan-zeman commented 5 years ago

[[[slightly before] or [right after]] midnight]

Note that this could be analyzed as ellipsis for slightly before midnight or right after midnight. Then we would get (by promotion, assuming that before is the orphaned dependent that gets promoted):

advmod(before, slightly)
conj(before, midnight)
cc(midnight, or)
advmod(midnight, right)
case(midnight, after)

(Before would not be banned from having dependents in this case. Since it has been promoted, its own relation to its parent is no longer case; it is probably obl.)

dan-zeman commented 5 years ago

In https://github.com/UniversalDependencies/tools/commit/2f346b5b15fe35db4a31fc33b3d591cd37dd5e1c the validator was modified to accept advmod and obl (not nmod, which is reserved for modifiers of nominals) under mark.

The restrictions on case were not changed because that would involve changing the guidelines, and other people have already adapted their annotations to the current guidelines.

coltekin commented 5 years ago

I agree that it can be analyzed as ellipsis, but I feel strongly that this is wrong for Turkish. My (non-native) feeling is also that in the English examples above, the correct analysis is modification of the case marker (e.g., before).

The above ellipsis analysis will also make negation ambiguous for examples like anytime after but not before midnight, making it unclear what is negated (I know UD is not concerned about semantics, but here right choices of syntactic analysis may help semantics too).

And before dismissing the option based on one specific argument/example in the documentation, I'd be happier to see more linguistic evidence/discussion that this is ellipsis rather than modification of case.

Besides what is the "correct" analysis, the problem I have with the current option is that once we adopt it, we lose information. It is easy to attach dependents of a case marker to its head later. However, it is not possible to automatically recover the modifiers of the case makers if it is decided in a later version that correct analysis was modification of the case.

dan-zeman commented 5 years ago

After some more discussion, I modified the validator to (temporarily) ignore the error if a case node has an advmod or obl child. This will at least somewhat reduce the burden before the upcoming release. The issue should be revisited when the next version of the guidelines is discussed. Therefore, I also modified the milestone to "later" (and renamed the issue to include case).

jnivre commented 5 years ago

Thanks, Dan. All things considered, I think this was a wise decision for the upcoming release. But I agree the issue needs to be discussed more thoroughly for future versions.

msklvsk commented 5 years ago

What about particles attached to cc and mark as discourse?

uk: Якщо ж en: If <emphasizing particle ж>

Ж doesn’t seem to modify the upper clause. It cannot be moved to a different position in a sentence.

jpiitula commented 5 years ago

I had two instances of coordinated adpositions (ADP) in a Finnish treebank that the validator rejected because of this rule.

fi: alta ja päältä viidenkymmenen markan en: below and above fifty marks

fi: ennen ja jälkeen [- -] lähtöä en: before and after [- -] departure

dan-zeman commented 5 years ago

@msklvsk : discourse is supposed to be "attached to the head of the most relevant nearby clause". If it clearly modifies a subordinating conjunction then the relation probably should not be discourse. I think I would use advmod because this relation has been used with emphasizing elements, although I admit that it is not what I imagine under "adverbial modifier" in the first place. The current version of the validator will accept it.

dan-zeman commented 5 years ago

@jpiitula : Coordination of adpositions is allowed. Was the structure like this?:

conj(alta, päältä)
cc(päältä, ja)
case(markan, alta)
nummod(markan, viidenkymmenen)

Such a tree should pass validation. A node that is attached as case cannot have children unless those children are attached via certain relations or have some other exceptional properties. And one of the permitted relations is conj.