UniversalDependencies / docs

Universal Dependencies online documentation
http://universaldependencies.org/
Apache License 2.0
270 stars 245 forks source link

Clefting in UDv2 #646

Closed peresolb closed 4 years ago

peresolb commented 5 years ago

Hi, I am trying to figure out how to annotate clefting constructions in Norwegian UDv2. (I am working on a gold-standard UD corpus for Norwegian.) I haven't found a lot about clefting in the documentation (but please point me in the right direction if I have missed something). There are some discussions on clefting in UDv1, like this one and this one, but I can't find much on clefting in UDv2. My issue concerns sentences like these: (1) Det var Peter som løp hjem, "It was Peter who ran home" (2) Det var i går (at) jeg så ham, "It was yesterday (that) I saw him" (3) Det var da jeg ruslet i Oslo at jeg så ham, "It was when I walked in Oslo that I saw him"

The current analysis in the Norwegian UD treebank for (1) is as follows: root(ROOT, Peter) expl(Peter, it) cop(Peter, is) acl:cleft(Peter, ran) I like this analysis for several reasons: it is treated as an expletive, which I think it is. The clefting clause is taken to be an adjectival modifier, which is consistent with it having a relative clause syntax with the focused nominal as the relativized element. Furthermore, the cleft subtype on the deprel sets these cases apart from other acl dependents.

The same analysis is used in Swedish, but not in Danish or English, AFAICT.

Does anyone have thoughts on examples like (2) and (3)? It looks like Swedish uses acl:cleft, at least in cases like (2) (e.g. in sv-ud-train-1574). I am a bit skeptical about that since you then get an acl dependent on a non-nominal word. Also, at least in Norwegian (and English?), the cleft clause is clearly not a relative clause, as the complementizer at, "that", is used instead of the relativizing complementizer som. Furthermore, the acl:cleft analysis does not generalize well to examples like (3), as you would get an acl dependent on a verb, acl:cleft(walked, saw).

I consider proposing a new subtyped relation advcl:cleft for Norwegian UD. (2) advcl:cleft(yesterday, saw) (3) advcl:cleft(walked, saw) advcl:cleft is used in French, but not in the same way as I suggest here. I would be interested in hearing if anyone has opinions on my suggested extension of the Norwegian/Swedish analysis and whether there are alternative solutions out there.

amir-zeldes commented 5 years ago

Just to add some facts for English, since there were some question marks above:

I think 'that' is a normal relative pronoun in English, so I wouldn't say the clause isn't a relative clause. That said, I can see the case for expl+cop, since the adverbial cases (it was yesterday that X) are really not relative semantically (but syntactically do look pretty much the same).

The one thing I'm not convinced about is that this should be advcl: if we think 'it' is expl, doesn't that suggest that the extraposed clause is a subject clause? I would like best either:

The second option is more or less analogous to the EWT 'spin' example.

peresolb commented 5 years ago

Thanks a lot, @amir-zeldes!

I agree that expl, cop, csubj works well for the spin example from EWT. However, I am not sure that it is the same construction. It seems to me to be what we in the Norwegian Dependency Treebank project called a clause-anticipating construction: Instead of having a clausal subject, you put in an expletive in the subject position and move the logical subject to the right end of the sentence. In other words, the spin example is equivalent to "what spin the press will take is always interesting". I do think such examples get an expl, cop, csubj analysis in UD-no. In my examples, "Peter", "yesterday" and "when I walked in Oslo" are focused, but I don't think that is the case with "interesting" in the spin example.

I have some issues with the expl, cop, csubj analysis in Norwegian (it possibly works better in English). It doesn't work for examples like (1), as you would get a relative clause as csubj. In Norwegian, som løp hjem:, "who ran home" is unambiguously a relative clause, as it is introduced by the relativizing complementizer som, and relative clauses cannot function independently as subject clauses in other environments. I am not convinced that it works for (2) and (3) either. In my judgement, the subordinate clause cannot replace the pronoun in subject position, unlike in clause-anticipating constructions like the spin example.

The nsubj, cop, acl analysis would work for (1). However, we lose the information that the pronoun is non-referring. I don't think an expletive analysis necessarily implies that the subordinate clause is a subject. Bouma et al. 2018 discusses expletives in UD. It distinguishes between cases where the expletive occurs in a chain with some other element in the sentence which is normally associated with the position where the expletive occurs (like the spin example or existential sentences like there is a cat on the mat), and cases like impersonal constructions (it rains), where there is no chain element. It explicitly (p. 24) leaves clefts out of the discussion, however, and it is not immediately clear to me how to fit them into the typology of expletive constructions in that paper.

EDIT: Full disclosure: I am one of the co-authors of Bouma et al. (2018)

nschneid commented 5 years ago

Regarding English—at expl we see:

Is EWT distinguishing copular complements that are adjectives vs. nouns? Or is the Exocets example an error?

My native speaker intuition is that "It is John that/who should decide" is closer to the "It is clear" example than to a normal relative clause. (There are ambiguous examples: e.g. "It is the painting that was stolen" could be a cleft if contrasting something else that wasn't stolen, or a normal referential "it" if "that was stolen" helps us know which painting is being referred to.)

Some more EWT cleft examples:

I suspect this is simply inconsistent and needs to be documented more clearly in the guidelines.

amir-zeldes commented 5 years ago

Agreed, and thanks for the ambiguous example! I think it nicely reveals a trade off in the choice. If we go with nsubj+acl, we can't tell apart clefts from regular clauses, but we make life harder for our parsers. If we go with expl + csubj, we have a difference, but parsers can get the structure wrong, leading to downstream errors.

I'm OK with either one, but I'd be against expl + acl - I feel like that's neither here nor there, and for English suggests that we have a subjectless sentence. I think that's not the case - in this example we have too many subjects, not too few.

amir-zeldes commented 5 years ago

And BTW regarding what @peresolb pointed out above, I am definitely not saying other languages need to have an nsubj, my comment above about subjects was just regarding the English examples

dan-zeman commented 5 years ago

Related older issue: UniversalDependencies/UD_Danish-DDT#11.

perrier54 commented 5 years ago

In the dev version of UD_French-GSD (https://github.com/UniversalDependencies/UD_French-GSD/tree/dev), we annotate cleft clauses as it is suggested by @peresolb . See the following examples :

You can verify the annotation of cleft clauses in the UD_French-GSD corpus with GREW : http://match.grew.fr/?corpus=UD_French-GSD@dev&custom=5d9c8db8e2f3b&clustering=e