Closed bansp closed 4 years ago
The reason I raised this issue was that I got an explicit error when importing this dataset into INCEpTION, and saw weird parses in the viewers provided by UDPipe and Tündra (while the visualization by conllu-viewer made me think that I wasn't perhaps getting the entire picture).
I have now skimmed through (Schuster & Manning, 2016) and some docs, and understand that this is an enhanced representation of elision under conjunction. The new question that this raises for me comes from the absence of similar analyses for the parallel languages that I have had a look at, namely German, French, Italian, and Polish.
In Polish, in particular, the corresponding sentence is
# sent_id = s25
# text = Najpierw zaczęła płakać jedna z jezydek, potem jej przyjaciółka.
# orig_file_sentence = n01012003#25
# conversion_status = complete
1 Najpierw najpierw ADV adv _ 2 advmod 2:advmod _
2 zaczęła zacząć VERB praet:sg:f:perf Aspect=Perf|Gender=Fem|Mood=Ind|Number=Sing|Tense=Past|VerbForm=Fin|Voice=Act 0 root 0:root _
3 płakać płakać VERB inf:imperf Aspect=Imp|VerbForm=Inf|Voice=Act 2 xcomp 2:xcomp _
4 jedna jeden ADJ adj:sg:nom:f:pos Case=Nom|Degree=Pos|Gender=Fem|Number=Sing 2 nsubj 2:nsubj _
5 z z ADP prep:gen:nwok AdpType=Prep|Variant=Short 6 case 6:case Case=Gen
6 jezydek jezydka NOUN subst:pl:gen:f Case=Gen|Gender=Fem|Number=Plur 4 obl 4:obl SpaceAfter=No
7 , , PUNCT interp PunctType=Comm 10 punct 10:punct _
8 potem potem ADV adv _ 10 advmod 10:advmod _
9 jej on PRON ppron3:sg:gen:f:ter:akc:npraep Case=Gen|Gender=Fem|Number=Sing|Person=3|PrepCase=Npr|PronType=Prs|Variant=Long 10 nmod 10:nmod _
10 przyjaciółka przyjaciółka NOUN subst:sg:nom:f Case=Nom|Gender=Fem|Number=Sing 2 conj 0:root|2:conj SpaceAfter=No
11 . . PUNCT interp PunctType=Peri 2 punct 2:punct _
where zaczęła (płakać) could be analysed as elided as well, but isn't (and I'm ignoring the appearance of the second root here, which signals to me that I know even less about this kind of dependencies than I thought I did).
My question/worry is: at least when set against the analyses in de, fr, it, and pl, doesn't the en representation using the dot-based elision mechanism wrongly suggest that there is something language-particular about this kind of ellipsis? Or, more poetically: how parallel can one realistically expect the PUD datasets to be, given (as far as I understand) different analyses for phenomena that should in theory receive parallel treatment?
Thanks in advance and best wishes :-)
The introduction of null nodes in the enhanced dependencies is triggered by the relation "orphan" in the basic dependencies. The "orphan" relation in turn should be used when another relation would be misleading because a word is attached to a promoted head which is really a co-dependent (see https://universaldependencies.org/u/overview/specific-syntax.html#ellipsis).
The difference between the English and Polish annotation is that the English annotators have judged the attachment of the adverb "then" to "one" with the relation "advmod" misleading and have therefore used the "orphan" relation instead (which leads to the introduction of the null node in enhanced dependencies). By contrast, the Polish annotators have used the "advmod" relation instead and therefore there is no null node.
For what it is worth, I checked the Swedish PUD treebank (which we annotated in Uppsala) and it follows the English analysis (with "orphan" and a null node). Possibly the same should have been done in Polish. In all fairness, however, it should be pointed out the guidelines are not precise enough here. It is quite clear that the "orphan" relation should be used when a core argument is attached to another core argument (as in "I like coffee and you tea") but it is not clear whether this extends to adverbial modifiers. Hence, the guidelines need to be specified better.
Thanks a lot for the reply, Joakim. Diving deeper into the UD literature now... :-) And closing this issue. Best wishes, Piotr
Hi and please excuse me if this is posted in the wrong place. (Did read the contributing guidelines but found no suggestion of a better target).
I have stumbled on a parse error in the following sentence in the current form of the UD_English-PUD treebank:
The 7 vs. 7.1 seems to be the offending bit.
Since I am totally lost when it comes to unravelling the conjunction magic of UD and dependencies in general, I am unable to suggest a correction but hope that someone else will be able to fix that :-) Thanks in advance!