UniversalDependencies / UD_English-EWT

English data
Creative Commons Attribution Share Alike 4.0 International
199 stars 42 forks source link

"do", "have" tagged as VERB with no object #403

Closed nschneid closed 10 months ago

nschneid commented 1 year ago

Many of these are embedded in an object relative clause. The enhanced edge for the object (E:obj) is missing (cf. #392).

Many of the rest are instances of elliptical stranding, and should be tagged AUX.

amir-zeldes commented 1 year ago

Not sure if it makes sense for these to be AUX if they are not functioning as AUX to anything overt... I mean, if you have ellipsis and just answer "I do!" to some question, isn't that the main verb at that point?

AngledLuffa commented 1 year ago

I think it makes sense for it to be tagged as if it's referring to the object that isn't there. Does anyone agree with Nate? I do! (do agree)

On Wed, Jun 28, 2023, 8:58 AM Amir Zeldes @.***> wrote:

Not sure if it makes sense for these to be AUX if they are not functioning as AUX to anything overt... I mean, if you have ellipsis and just answer "I do!" to some question, isn't that the main verb at that point?

— Reply to this email directly, view it on GitHub https://github.com/UniversalDependencies/UD_English-EWT/issues/403#issuecomment-1611464904, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2AYWNMNXZCESBOELPOUJDXNQZ6VANCNFSM6AAAAAAZBWSSKA . You are receiving this because you are subscribed to this thread.Message ID: @.*** com>

amir-zeldes commented 1 year ago

OK, I'm curious to see what others think. I guess I'm a little uncomfortable tagging things as AUX which are not auxiliaries to an actual something, because it's a relational type of category, and while I understand the 'object that isn't there' idea, UD is largely surface oriented and has pretty solid ideas about promotion (we also deprel elliptical relatives to an 'auxiliary' as acl:relcl because once there is no lexical predicate, they are effectively the main predicate)

jnivre commented 1 year ago

In clear cases of VP ellipsis, the auxiliary has to be promoted with respect to its deprel but should in my opinion retain the AUX tag, even in cases where the controlling context for the ellipsis is inter-sentential. For example:

You do understand, don't you?

(A: Do you understand?) B: I do.

The behavior of these is clearly different from main verb uses of "do" and "have", which themselves can take "do" and "have" as auxiliaries. For example (with VERB instances capitalised):

You DO yoga, don't you?

(A: Do you DO yoga?) B: I do.

This is

nschneid commented 1 year ago

Agree with @jnivre, these bear properties of auxiliaries as distinct from lexical verbs.

Other properties include subject-aux inversion and negation, as in this tag question:

UPOS tags tend to be lexically-oriented rather than determined by the syntactic construction (except where there is ambiguity). In particular, we wouldn't want to have to say that every modal aux is polysemous between AUX and VERB.

Thus it should remain AUX even when promoted to predicate of the clause due to ellipsis.

amir-zeldes commented 1 year ago

OK, this seems to be the consensus, so I will modify it in the corpora I maintain

nschneid commented 1 year ago

Refined criteria:

nschneid commented 1 year ago

DepEdit script:

; VERB->AUX for stranded "do"
; (may need to run twice because of bleeding with the 2nd-to-last rule)
; with object -> lexical "do"
lemma=/do/&upos=/VERB/;func=/obj|[xc]comp|.*:pass/  #1>#2   #1:storage=lex_do
; do well, how are you doing
lemma=/do/&upos=/VERB/;lemma=/how|likewise|so|good|fine|well|great/&func=/advmod/   #1>#2   #1:storage=lex_do
; hard to do, have little to do with
lemma=/do/&upos=/VERB/;lemma=/to/&upos=/PART/&func=/mark/   #1>#2   #1:storage=lex_do
; do as you will (wilt)
lemma=/do/&upos=/VERB/;lemma=/will/&func=/advcl/    #1>#2   #1:storage=lex_do
; that will do
lemma=/do/&upos=/VERB/;lemma=/will/&func=/aux/  #1>#2   #1:storage=lex_do
; monkey see, monkey do
lemma=/do/&upos=/VERB/;lemma=/monkey/&func=/nsubj/  #1>#2   #1:storage=lex_do
; do or die
lemma=/do/&upos=/VERB/;lemma=/die/&func=/conj/  #1>#2   #1:storage=lex_do
; what it has done and is still doing
lemma=/do/&upos=/VERB/&func=/conj/;lemma=/do/&upos=/VERB/   #2>#1   #1:storage=lex_do
; exclude: things to do, things we do (zero relative), have it done
lemma=/do/&upos=/VERB/&xpos!=/VBN/&storage!=/lex_do/&func!=/xcomp|acl|.*:relcl/ none    #1:upos=AUX

; VERB->AUX for stranded "have"
; with object -> lexical "have"
lemma=/have/&upos=/VERB/;func=/obj|[xc]comp|.*:pass/    #1>#2   #1:storage=lex_have
; fun to have
lemma=/have/&upos=/VERB/;lemma=/to/&upos=/PART/&func=/mark/ #1>#2   #1:storage=lex_have
; comparative clause
lemma=/have/&upos=/VERB/;lemma=/than/&func=/mark/;lemma=/have/&upos=/VERB/  #1>#2;#3.*#1    #1:storage=lex_have
; as much as you have
lemma=/have/&upos=/VERB/&func=/advcl/;lemma=/much/  #2>#1   #1:storage=lex_have
; have and will have
lemma=/have/&upos=/VERB/&func=/conj/;lemma=/have/&upos=/VERB/   #2>#1   #1:storage=lex_have
; is influenced by and has Jamaican references
lemma=/have/&upos=/VERB/&func=/conj/;lemma=/influence/;func=/obl/   #2>#1;#2>#3 #1:storage=lex_have
; which one had: (before list)
lemma=/have/&upos=/VERB/;upos=/PUNCT/&form=/:/  #1.#2   #1:storage=lex_have
lemma=/have/&upos=/VERB/&func=/reparandum/;upos=/VERB/  #2>#1   #1:storage=lex_have
; exclude: things we have (zero relative)
lemma=/have/&upos=/VERB/&storage!=/lex_have/&func!=/xcomp|acl|.*:relcl/&num!=/.*\..*/   none    #1:upos=AUX
nschneid commented 1 year ago

@amir-zeldes Does the above look good to you?

amir-zeldes commented 1 year ago

Probably, but will need to find a moment to check these in more detail. It might be a few days, sorry!

amir-zeldes commented 1 year ago

This is a bit trickier, some of these could go either way, and some are not so lexiclized. Consider "do as you will" - is it really limited to will? What about "do as you're told"? If we extend it to any advcl, I think you'll quickly get ambiguous cases:

For some cases I'm not even sure what's correct, for example:

Things like "what it has done and is still doing" could be VERB, but imagine you have "which":

I think as advcl:relcl it's maybe AUX, but as acl:relcl maybe VERB. We could try to list all of these cases, but it would get awfully complicated very fast... I'm wondering if we should adopt a simpler heuristic definition (along the lines of "VERB if it has obj, x or y, otherwise always AUX") at the expense of a very nuanced approach which is harder to maintain. Or really just scan edge cases and manually annotate those, but TBH I don't love playing with UPOS while also investing so much effort in XPOS.

nschneid commented 1 year ago

I developed the rules by looking pretty meticulously through EWT. I don't know if I checked them systematically against GUM, so in principle there could be ambiguity, yeah. I don't think "do" in "do as you will" is an auxiliary though, so if UD makes the distinction we should try to implement it....

Things like "what it has done and is still doing" could be VERB, but imagine you have "which":

  • ... which it has done and is still doing

Perhaps that's a tricky case (because sentence anaphora are weird), but I don't see it in either corpus so in practice it's not likely to be a big source of errors.

I would definitely say VERB for "what it has done" (acl:relcl) because that should correspond to an E:obj(done,what) relation.

"Will do." as a response—I think it's short for "Will do it" or "Will do that" or "Will do what you suggest", so VERB.

nschneid commented 10 months ago

@amir-zeldes Thoughts on incorporating the above script into GUM?

amir-zeldes commented 10 months ago

Yes, this is now folded into the build bot here - I tried to compress it a bit to keep the rules from exploding too much:

https://github.com/amir-zeldes/gum/blob/339dd86501aabfe8b3b9b5df00d42bbda25092e8/_build/utils/upos.ini#L125