UniversalDependencies / UD_English-EWT

English data
Creative Commons Attribution Share Alike 4.0 International
201 stars 43 forks source link

expletive with raising as csubj vs ccomp #120

Open amir-zeldes opened 3 years ago

amir-zeldes commented 3 years ago

There are two analyses in EWT for raising verbs with expletive "it seems":

I think the second one with csubj is better (this is what we have in GUM), but either way I think we should choose one analysis.

nschneid commented 3 years ago

I think csubj is correct per example 5 on https://universaldependencies.org/u/dep/expl.html

nschneid commented 3 years ago

I think it's fair to say that any predicate with an expl dependent and a complement must also have some sort of subject.

Here are 27 violations in EWT, some of which aren't actually expl: http://match.grew.fr/?corpus=UD_English-EWT@2.7&custom=5fedf4f6bdfb3

The most interesting is "When it came time to pay the bill". "Came time" is idiomatic, probably related to "became". What would the structure be for "It became rainy"—expl + xcomp?

"it seemed to take the hotel staff quite a while to quiet them down": should be csubj(seemed, take)?

"There needs to be a recount"—"There is a recount" would be expl(is, There), nsubj(is, recount) (example 7 here). If the main verb is "needs", I guess the infinitival clause should be csubj?: expl(needs, There), csubj(needs, be), nsubj(be, recount)? Or maybe xcomps are an exception to the rule. Example 4 here shows an infinitival copular xcomp.

Also, here's a There+BE+NP existential sentence where the BE verb should be the head: http://match.grew.fr/?corpus=UD_English-EWT@2.7&custom=5fedf761a5c27

amir-zeldes commented 3 years ago

Agreed, I think all of these should have non-expl subjects, so these should be expl+csubj based on the following paraphrases:

For the recount, note that EWT makes lexical subjects of raising verbs be nsubj to the raising verb, not the embedded predicate (unlike SD used to do earlier), so I think that suggests no csubj here, but rather nsubj:

This is by analogy to lexical subject cases like "Lori seems to think" here:

# sent_id = email-enronsent04_01-0006
...
14  Lori    Lori    PROPN   NNP Number=Sing 15  nsubj   15:nsubj|17:nsubj:xsubj _
15  seems   seem    VERB    VBZ Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin   0   root    0:root  _
16  to  to  PART    TO  _   17  mark    17:mark _
17  think   think   VERB    VB  VerbForm=Inf    15  xcomp   15:xcomp    _

(so Lori is not taken to be the syntactic subject of "think", even though semantically that is implied by the raising analysis, and there is no csubj)

nschneid commented 3 years ago

Reflecting on the xcomp analysis, I think the idea is that a control or raising verb can "steal" the expletive subject from the embedded predicate. So the rule about having no complement without a non-expletive subject then applies to the embedded predicate. I think all of these are incorrect: http://match.grew.fr/?corpus=UD_English-EWT@2.7&custom=5fedfb8215fde

nschneid commented 3 years ago

@amir-zeldes For "There needs to be a recount" I'm thinking: expl(needs, there), xcomp(needs, be), nsubj(be, recount). The raising verb can steal an embedded surface subject, i.e. "there", but not the deep subject "recount", I don't think.

nschneid commented 3 years ago

I.e. "deep" structure of

needs [there_expl/arg0 TO-BE a recount_nsubj/arg1]_xcomp/arg1

cannot be stable because the matrix verb lacks a subject, so it's realized as

there_expl/arg0 needs [TO-BE a recount_nsubj/arg1]_xcomp/arg1

(where subjects/complements are numbered in order, arg0 being the thing that appears before the verb)

amir-zeldes commented 3 years ago

Reflecting on the xcomp analysis, I think the idea is that a control or raising verb can "steal" the expletive subject from the embedded predicate. So the rule about having no complement without a non-expletive subject then applies to the embedded predicate. I think all of these are incorrect: http://match.grew.fr/?corpus=UD_English-EWT@2.7&custom=5fedfb8215fde

I agree, those look wrong to me too:

amir-zeldes commented 3 years ago

@amir-zeldes For "There needs to be a recount" I'm thinking: expl(needs, there), xcomp(needs, be), nsubj(be, recount). The raising verb can steal an embedded surface subject, i.e. "there", but not the deep subject "recount", I don't think.

This is basically the old SD analysis, which I take it was overturned for some reason. I don't feel very strongly about it, but it would kind of annoy me to revert all of these, because we actually went to some trouble moving from SD to UD, to undo all of these analyses into the "Lori seems to think" form above (which in SD used to have "think" as the head of "Lori" IIRC). I think you could either prioritize semantics, and say "I want the tree to reflect the semantic argument structure", so what "needs" is "for a recount to be", or say that UD represents surface morphosyntax, and on the surface, agreement suggests that "recount" is the subject, incl. both word order and the 3rd person 's', compare:

Since UD doesn't have a 'deep' layer or anything like transformations, you can only do justice to one of these ideas in the basic tree (semantic argument structure or surface morphosyntax). Edeps could let you express both of course.

Maybe @mcdm or @sebschu can chime in for some context on how this changed from SD to UD?

nschneid commented 3 years ago

This kind of thing is why I'm not a syntactician. :)

With agreement as a criterion I guess the structure would differ between

which is awfully subtle.

My guess was that with an xcomp, only one shared argument would be attached to the matrix verb, thus

Agreed that a clarification would be helpful!

nschneid commented 2 years ago

Encountering this again in #302—"there seems to be a problem", "there is going to be...", etc.

nschneid commented 6 months ago

OK here's what I think is going on, drawing on some of the above examples (but "math courses" instead of "recounts"). In some of the examples, a gap in the embedded clause is made explicit:

Non-expletive

There-expletive/existential

It-expletive

Infinitival extraposed clause

Finite extraposed clause

With an adjective xcomp instead of "to be the case":

Comparative-like cases

Even if these sentences are not read with comparative semantics, arguably the drawing-a-conclusion construction recruits the comparative clause strategy. I wonder if advcl is appropriate for the like/as clause (but then does "seem" have any core arguments?).

amir-zeldes commented 6 months ago

I like csubj even for the like cases - as you say it saturates seem's argument, and otherwise we split it into two kinds of "seem" just based on the presence of "like". I think a marked clause can assume the subject position similarly to a PP, e.g. "at home is perfect". If we like "home" as a subject in that case (assuming a PP to NP conversion via unary derivation), then "like" as a subject clause should also be possible.

nschneid commented 6 months ago

Yes PPs can be subjects. I dunno if that implies contentful-subordinator-marked clauses can be as well though.

I can imagine people saying these in conversation, but do we want to treat them as grammatical, or as structural gear-shifting mid-sentence?

rueter commented 6 months ago

Yes PPs can be subjects. I dunno if that implies contentful-subordinator-marked clauses can be as well though.

  • ?Because he arrived late is a bad reason to punish him.
  • ?If you are sick is the only good reason for missing class.

I can imagine people saying these in conversation, but do we want to treat them as grammatical, or as structural gear-shifting mid-sentence?

The first of the two examples sounds like a mention, i.e., "'Because he arrived late' is a bad reason to punish him." The second example, however, is hard to really imagine.

amir-zeldes commented 6 months ago

I can imagine people saying these in conversation, but do we want to treat them as grammatical, or as structural gear-shifting mid-sentence?

I think it's grammatical to do that, at least to the extent that I wouldn't feel that some sort of reparandum analysis is warranted there. If we think the "because" example should receive a proper analysis, then that's already reason enough to allow it.