UniversalDependencies / docs

Universal Dependencies online documentation
http://universaldependencies.org/
Apache License 2.0
272 stars 246 forks source link

Nominalisation of a finite VP in Russian and Erzya #509

Closed rueter closed 6 years ago

rueter commented 6 years ago

Это сова кричала /eto sova kričala/ That owl screached ‘that was an owl screaching.’ (0) you first hear the screach. (1) someone asks: "What was that?"

nsubj(nsubj(кричала, сова), это) nsubj(nsubj('screached', 'owl'), 'that')

What should the dependency relation be?

Or would it be better to do this: ccomp(это, nsubj(кричала, сова)) ccomp('that', nsubj('screached', 'owl'))

‘that was an owl screaching.’

jnivre commented 6 years ago

Assuming that "that" is referential, it looks to me like a case of nonverbal predication, where the predicate is a clausal structure. So, in principle, I think it should be:

nsubj(screeching, owl) nsubj(screeching, that)

However, this is the kind of structure that is dispreferred because the predicate inside the clausal structure is assigned two subjects. For English, we therefore switch to treating the copula as the root, with the clausal structure as a ccomp. However, that option is not open here because there is no copula. :(

dan-zeman commented 6 years ago

There are somewhat similar examples in Czech. We attach the to “that” as a discourse dependent of the predicate of the clause.

http://hdl.handle.net/11346/PMLTQ-SW6Y

sylvainkahane commented 6 years ago

It seems that is a predicative construction where in surface syntax the copula has two complements, an "obj" and an "xcomp". screeching cannot be the head because and can be deleted, at least in in English: that was an owl. According to UD choices, I think I would do that:

nsubj(owl,that) xcomp(owl,screeching)

dan-zeman commented 6 years ago

I would not necessarily expand it as "it was an owl who screached" or "it was screaching of an owl". I don't like the amount of fantasy that is needed to postulate this structure from the words that were actually uttered. Это is a demonstrative PRON/DET and it is quite possible that sentences like this evolved from one of the "it was"-like hypotheses here, but synchronically, I don't think that the demonstrative is part of the clause structure; it is rather a discourse connector that refers to something previously experienced by the persons involved in the dialogue.

I would also argue that the demonstrative is optional and one could say just сова кричала in the same context.

jnivre commented 6 years ago

If the facts are in English, I support Sylvain's analysis, except possibly that it should be "acl" instead of "xcomp" if it is really an optional modifier. (But this is an area of the guidelines that need more investigation.)

dan-zeman commented 6 years ago

@sylvainkahane you can delete either eto or kričala or both; if you delete kričala however, you are reducing the meaning. You are essentially saying "this is an owl". It could be a reference to a sound that you and the addressee just heard, but it could also be a reference to a dead animal that you see lying on the path.

In any case, sova kričala is a finite past-tense clause meaning "the/an owl screeched".

amir-zeldes commented 6 years ago

If this is like Polish and Hebrew, which have a functionally similar construction, then I agree with @dan-zeman . I think one way to check would be with negation. In Polish you can do "to nie...", which in this case would NOT mean that it's not an owl. It would mean: "It's not that an owl was screeching".

The discourse versus nsubj distinction I find trickier - in many colloquial situations the demonstrative is very weak ('to nie chcę' - (it's that) I don't wanna)). But from a formal point of view, I don't think there's a situation that rules out a nominal sentence reading: [A] is [B] - "[It] is [(that) I want...]". So maybe nsubj is safer, since it's always a possibility and we don't have to interpret what kind of demonstrative it is.

ftyers commented 6 years ago

With negation in Russian it would be like "это не сова кричала", and you could add ", а барсук [кричал]" to get the meaning something like "It wasn't an owl that screeched, but a badger [that screeched]".

dan-zeman commented 6 years ago

To continue the projections to neighboring languages :), here is a Czech negated counterpart:

To nekřičela sova, ale jezevec. “It wasn't an owl that screeched, but a badger [that screeched].”

Interestingly, negation in Czech is done using the bound morpheme ne-, so it is obvious that we negate screeched and not a hypothetical copula (which would have to be overt in Czech anyway). Attaching the demonstrative as (a second) nsubj would be extremely odd here; if discourse is not plausible, then perhaps an expl?

Just my 2 cents. I'm not saying that Russian and Czech must behave identically here, but I'm leaning towards finding a common solution unless it turns out there is a fundamental difference.

olesar commented 6 years ago

'that' is labeled as expl in this particular construction in Russian. We discussed this case with the developers of Russian SynTagRus, so this is a motivated decision. I think that understanding of expl as the argument double (second subject, second object, etc.) is ok here. Cf. also the second part of expl documentation:http://universaldependencies.org/u/dep/expl.html

dan-zeman commented 6 years ago

There are some doubts that this is clitic doubling, though. The owl is feminine in Russian (as well as in Czech) and the gender is cross-referenced by the verb form. The demonstrative is neuter. If its function were to “double” the owl, I would expect it to take the feminine form, too. But it is always neuter in this construction, regardless of the gender of the subject.

sylvainkahane commented 6 years ago

It is spossible that the demonstrative does not refer to the owl but to the general situation. That's what we have in French where we often have the demontrative instead of a personnal pronoun in impersonnal construction (c'est sur qu'il dort 'it is certain that he is sleeping', lit. that is …) as well as the kind of construction we are discussing (c'est une chouette qui crie 'that's an owl that is screeching').

rueter commented 6 years ago

Thank you this has been helpful. I have given this some further thought:

Главная проблема — это не коррупция.

'The main problem, it's not corruption.' Here it would seem merited to consider an expl reading.

but what would be the relation between 'problem' and 'that/it'?

dislocated(это, проблема) In Russian, there is no correspondence between gender, so THIS does not seem likely.

In Erzya there is no gender to worry about, so it is possible.

Reword:

Главная проблема -- это сова кричала.

Главная проблема -- это то, что сова кричала.

In the latter rendition:

?nsubj(это, то)

?csubj(это, то)

More food for thought. It would be good to get some more Encyclopedic type explanations, i.e., X -- это Y.

Jack Ruete

On Thu, Nov 23, 2017 at 11:04 AM, sylvainkahane notifications@github.com wrote:

It is spossible that the demonstrative does not refer to the owl but to the general situation. That's what we have in French where we often have the demontrative instead of a personnal pronoun in impersonnal construction (c'est sur qu'il dort 'it is certain that he is sleeping', lit. that is …) as well as the kind of construction we are discussing (c'est une chouette qui crie 'that's an owl that is screeching').

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/UniversalDependencies/docs/issues/509#issuecomment-346562058, or mute the thread https://github.com/notifications/unsubscribe-auth/ANnWRABO3E-GjxtdZqBSIipBQv0Wllkqks5s5TUWgaJpZM4Qnbn4 .

dan-zeman commented 6 years ago

Главная проблема — это не коррупция. Glavnaja problema — èto ne korrupcija. Main problem — this not corruption.

amod(проблема, Главная)
nsubj(проблема, коррупция)
advmod(проблема, не)
cop(проблема, это)
punct(проблема, —)
punct(проблема, .)
rueter commented 6 years ago

And then, of course, in the past tense: Главная проблема -- это не коррупция было. cop(проблема, было) ?cop(проблема, это) so maybe: amod(проблема, Главная) dislocated(коррупция, проблема) advmod(коррупция, не) expl(коррупция, это) cop(коррупция, было)

Sent from my iPhone

On 23 Nov 2017, at 19.42, Dan Zeman notifications@github.com wrote:

Главная проблема — это не коррупция. Glavnaja problema — èto ne korrupcija. Main problem — this not corruption.

amod(проблема, Главная) nsubj(проблема, коррупция) advmod(проблема, не) cop(проблема, это) punct(проблема, —) punct(проблема, .) — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

dan-zeman commented 6 years ago

Sorry, Jack, this sentence is beyond my competence in Russian. Especially the neuter copula было sounds odd to me. Anyone else?

rueter commented 6 years ago

sorry Dan, my mistake. -- это была не коррупция.

Sent from my iPhone

On 23 Nov 2017, at 22.19, Dan Zeman notifications@github.com wrote:

Sorry, Jack, this sentence is beyond my competence in Russian. Especially the neuter copula было sounds odd to me. Anyone else?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

dan-zeman commented 6 years ago

Ah, I see.

это была не коррупция èto byla ne korrupcija this was not corruption

So I think this one is easy and prototypical:

nsubj(коррупция, это)
cop(коррупция, была)
advmod(коррупция, не)

However, if we reintroduce the “problem”, we have a problem :-)

Главная проблема – это не была коррупция Glavnaja problema – èto ne byla korrupcija Main problem – this not was corruption “The main problem is that this was not corruption” OR “The main problem was not corruption [but something else]”

(I believe both translations are possible here; please correct me if I'm wrong.)

The first reading is an equational non-verbal predication where one side of the predication is glavnaja problema and the other side is a clause. As @jnivre says above, the current UD guidelines attempt to avoid double subjects and treat such cases as an exception where the copula is the head. However, the guidelines show this on an example from a language (English) where overt copulas are granted. In Russian we don't always have the copula but we have free word order, i.e., more freedom in deciding which side of the equation is the predicate and which one is the subject. Here I would argue that problema is actually better analyzed as the predicate (because if you think of paraphrasing with instrumental, glavnoj problemoj bylo, čto èto ne korrupcija seems quite plausible). However, if Erzya or another language has a fixed order like English and missing copulas like Russian, we are screwed :-)

csubj(проблема, коррупция)
amod(проблема, Главная)
nsubj(коррупция, это)
cop(коррупция, была)
advmod(коррупция, не)

In the second reading we have just one clause and we again don't know what to do with the demonstrative èto. If we agree that it can be analyzed as a pronominal copula in present tense, then we could stick to that analysis here and have two words that together act as the copula: èto and byla. Or, alternatively, we could attach èto as an expletive.

nsubj(проблема, коррупция)
amod(проблема, Главная)
cop/expl?(проблема, это)
cop(проблема, была)
advmod(проблема, не)
sylvainkahane commented 6 years ago

Problem: words like problem can be involved in another construction:

Problem: The door cannot open. Question: Who is in charge? New idea: Doing that before. First remark: Do that before.

Such construction are also observed in spoken productions.

Semantically, they can be analyzed as an equational sentence: 'my first remark is "do that before" '. But I don't think it makes sense to consider remark as a subject.

I'm not sure how to encode them. When we have encounter them, we have analyzed the first member as a dislocated phrase:

dislocated(do,remark)

In French, there is a kind of continuum between such construction and pseudo-clefts with productions that could be translated as:

a thing that I would like to say to you, do that before What I would like to say to you is to do that before

dan-zeman commented 6 years ago

Could we use parataxis for the examples with colons? Or even list?

bulbulistan commented 6 years ago

In the upcoming Maltese UD treebank, I use list for whatever precedes the colon. In parliamentary debates, for example, these are often names of PMs.


From: Dan Zeman notifications@github.com Sent: divendres, de novembre 24, 2017 11:56 Subject: Re: [UniversalDependencies/docs] Nominalisation of a finite VP in Russian and Erzya (#509) To: UniversalDependencies/docs docs@noreply.github.com Cc: Subscribed subscribed@noreply.github.com

Could we use parataxis for the examples with colons? Or even list?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/UniversalDependencies/docs/issues/509#issuecomment-346801563, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AR3ESP2BznjNYJZLApkrTCYPoE5_7Gzyks5s5qDOgaJpZM4Qnbn4.

jnivre commented 6 years ago

We have used parataxis for similar constructions in the Swedish treebank.

ermanh commented 6 years ago

We also recommended parataxis for similar constructions in our Chinese guidelines.