Open jnivre opened 8 years ago
The corresponding Norwegian analysis of clefts is:
root(ROOT, er) expl(er, det) nsubj(er,X) acl:relcl(X,VERB)
So, mostly like the Swedish analysis except for the relation of X. The analysis is described in the documentation with an example here.
Thanks! Does the relation of X reflect its relation in the underlying unclefted sentence, so that it would be dobj in something like:
det var Pelle som jag såg
The original analysis in NDT has X as a PSUBJ (a "potential subject") and there is a corresponding POBJ relation which I guess would be used for these types of Xs, but I have not been able to dig up any example. Seeing it now, however, I think there is some loss of information in the conversion of the PSUBJ to nsubj
, so a dislocated
analysis might be a good choice to distinguish these from regular subjects. Does the Swedish analysis distinguish dislocated objects from subjects in any way?
A small aside, the original analysis actually distinguishes focus clefts from presentational clefts, and it is only in the latter case that the X is a PSUBJ. For the focus clefts (e.g. Det er et ubeskrivelig syn som møter ham) the X is a SPRED (subject predicative) hence gives rise to a regular copula analysis. The consequence is that it is only the presentational clefts that have the analysis outlined above. Is there a similar distinction of different types of clefts in other UD treebanks?
Thanks, Lilja. I considered using language-specific subtypes "dislocated:nsubj" and "dislocated:dobj" but in the end decided against it, because I don't think this is what subtypes are for. Possibly, what we should do is just have "dislocated" in the basic dependencies but add an "nsubj" relation (from the VERB) in the enhanced dependencies. This is yet another issue where guidelines for basic dependencies are dependent on (future) decisions about the enhanced dependencies. I therefore thing that v2 of the guidelines need to have at least a rudimentary version of the enhanced dependencies too.
I am not aware of any treebank that draws a distinction between presentational and focus clefts. I am not even sure that I am able to draw the distinction myself. :)
Any more thoughts on this?
I will try to conform the Norwegian data to the analysis adopted for Swedish, i.e. changing nsubj
to dislocated
.
I will close this issue and open a new issue to fix Danish for the next release.
I am inclined to say that this is a bug in UD_Danish to be fixed for version 2. Any other ideas? I would be happy to assign this issue to someone from the UD_Danish team, but I don't know who.
@jnivre I can take a look at it if you wish?
A bit late to the discussion... but you're welcome to look at how we handle clefts in Irish.
https://universaldependencies.org/ga/dep/csubj-cleft.html
We don't use acl:relcl because the clause is not relativising the fronted element. In Irish we can front nouns, prepositional phrases, adverbial phrases, adjectives and verbal nouns.
The trees in our Irish examples need some improvements - I'll get around to it!
The recently implemented policy in English is advcl:relcl
: https://universaldependencies.org/en/dep/acl-relcl.html#clefts
I can also point to the minidocumentation I wrote for Latin: https://universaldependencies.org/la/dep/csubj-cleft.html (we are sharing the realtion with Irish).
The structure of all these clefts is really the same. I think it is also crucial to treat the copula as the functional element it is, and not as the root, which makes the structure intractable.
related to #3
There seems to be an inconsistency in the treatment of clefts in Scandinavian treebanks.
det är X som/der ... VERB
Da: root(ROOT, X) nsubj(X, det) cop(X, är) acl:relcl(det, VERB)
Sv: root(ROOT, är) expl(är, det) dislocated(är, X) acl:relcl(X, VERB)
No: ???