Closed kilian-gebhardt closed 5 years ago
Thanks for the report. The problem is that the secondary edge refers to a node that does not dominate any terminals. I have to decide how to handle such cases. The obvious workaround is to ignore such secondary edges, because such empty nodes are filtered out as well (line 620). Additionally, in this sentence, the annotation looks off to me.
Here's the tree. Basically, two adjectives modifying two nouns, in all combinations ...
Due to my limited knowledge of Dutch it is hard for me to judge this annotation. So if I understand correctly,
there are "jewish wounded ones", "jewish killed ones", "arabic wounded ones", and "arabic killed ones". I have not yet looked how discodop
decides on the primary annotation. Maybe an annotation similar to this one from TiGer is more suitable, but perhaps hard to produce automatically during conversion.
I think it is reasonable (and preferable to throwing an exception) to drop empty categories and secondary edges referring to them (and maybe output a warning during the conversion).
I went for the pragmatic solution of ignoring these problematic secondary edges, and printing a warning. I checked and there are 6 sentences in Lassy Small with this issue.
The assumption that every node must dominate one or more terminals is baked into all of the code (I think this is also assumed by the Negra export format). Therefore it would be rather difficult to preserve these annotations exactly.
Thanks.
The assumption that every node must dominate one or more terminals is baked into all of the code (I think this is also assumed by the Negra export format). Therefore it would be rather difficult to preserve these annotations exactly.
I cannot directly find such a restriction in the NeGra export format documentation, which apparently also does not state that the structures need to be rooted. Technically it is possible to have phrasal nodes in the export format that do not have child nodes. However, I agree that supporting this in discodop is not worth the effort – no one makes use of secondary edges as far as I know.
See Skut et al 1997, https://arxiv.org/pdf/cmp-lg/9702004.pdf: "The following features of our formalism are then of particular importance: [...] complete absence of empty categories" Perhaps one could argue that the restriction is not part of the Negra export format as such but only a design goal of the Negra annotation scheme.
When converting the attached tree from the Lassy corpus (or drawing with secondary edges), I obtain the following error:
WR-P-E-I-0000015007.p.1.s.114.xml.txt