UniversalDependencies / docs

Universal Dependencies online documentation
http://universaldependencies.org/
Apache License 2.0
267 stars 245 forks source link

Ellipsis in UD #1044

Open ClaudiaCorbe opened 1 month ago

ClaudiaCorbe commented 1 month ago

Ellipsis in UD:

Hi, I'm working on the Italian_Old treebank, which consists (so far) of part of the Divine Comedy, an Old Italian poetry text. During the process of annotation, I faced several problems with the annotation of ellipses.

As you already know, in UD there are two possibilities for annotating elliptical structures:

  1. orphan deprel
  2. promotion

However, UD annotation (excluding Enhanced Dependency, which, so far, are not as numerous as standard treebanks) makes it difficult to retrieve and analyze ellipses. On one hand, the orphan relation signals the presence of an ellipsis, but it obscures the dependency relations of the sentence (see example 1 below). On the other hand, promotion is used without explicitly signaling the ellipsis, resulting in a loss of information regarding the presence of this phenomenon (see example 2).

Example 1: Ed elli a me (Inferno, III v. 76) Gloss = And he to me

Example 2: e la lingua (...) si fende, e la forcuta ne l'altro si richiude (Inferno, XXV, vv. 133-135) Gloss = And the tongue (...) REFL cleave.3sing, and the forked.femsing in the other REFL close.3sing

Schermata 2024-07-14 alle 14 26 20

I suggest the possibility of:

I will provide the same example given before with the suggested modification:

Schermata 2024-07-14 alle 14 52 02

Schermata 2024-07-14 alle 14 58 28

In the first example of ellipsis, I have also been suggested to select a me (to me) as the head, resulting in the following structure:

Schermata 2024-07-14 alle 14 57 11

To deal with cases where we already have a subtype (e.g., nsubj:pass), we could adopt the @ symbol, as used in SUD, resulting in nsubj:pass@ellipsis.

nschneid commented 1 month ago

Thanks for bringing this up—I agree the current treatment of ellipsis is not fully satisfying!

Speaking just to what we do in English:

We are reluctant to introduce many new subtypes as we feel that ~50 deprels is what our annotators will be able to handle.

In EWT, I have started adding Promoted=Yes to MISC where I notice non-orphan cases of ellipsis. This will help us understand why an ADJ is attaching as nsubj, for example (and reassure us that it's not an error).

Regarding the orphan cases, in English we have enhanced graphs, so the underspecification of orphan is not an issue. If you wanted to hint at the inferred deprel without introducing an enhanced graph or adding a bunch of subtypes, you might experiment with a new MISC attribute for that, e.g. EllipsisDeprel=obl. This could be a stepping stone toward adding the enhanced graph in the future.

Stormur commented 1 month ago

I would like to notice that subtypes are not new relations, though, especially when they are simple references to main types.

Here we are speaking more of the status of a relation as appearing in an elliptical construction or not. I am totally in favour of introducing "relation statuses" which remind me feature layers, and of which we might discuss the exact annotation (@, [], ... ). In fact, I am convinced this is strongly needed. It might be possible to consider regular "subtype extensions" for other underdefined relations, e.g. dislocated.

It is different than enhanced annotation, because it does not involve reconstructing the non-elliptical version (if ever possible!) of the construction, which might be beyond the goals of many treebanks. It just signals challenging cases, and this already is extremely beneficial to queries and data extraction. Also, not everybody is willing to query on enhanced graphs.

In EWT, I have started adding Promoted=Yes to MISC where I notice non-orphan cases of ellipsis. This will help us understand why an ADJ is attaching as nsubj, for example (and reassure us that it's not an error).

One could argue that, if this happens, it is always an ellipsis. This is one of the main points of the OP.