UniversalDependencies / docs

Universal Dependencies online documentation
http://universaldependencies.org/
Apache License 2.0
270 stars 245 forks source link

Lemmas of English personal pronouns #517

Open nschneid opened 6 years ago

nschneid commented 6 years ago

It is not obvious how pronouns should be lemmatized (cf. #276 for Slavic). The UD_English corpus does the following:

Nominative (PRP):

I -> I
you -> you
he -> he
she -> she
it -> it
we -> we
they -> they

Accusative (PRP):

me -> I
you -> you
him -> he
her -> she
it -> it
us -> we
them -> they

Dependent possessive (PRP$):

my -> my (!)
your -> you
his -> he
her -> she
its -> its (!)
our -> we
your -> you
their -> they

The pattern here is that they are normalized to nominative case, except for "my" and "its", which should probably be "I" and "it", respectively.

Independent possessive (PRP, no morphological features): mine, yours, ours, theirs, etc.: no normalization

Reflexive (PRP): myself, yourself, ourselves, yourselves, themselves, etc.: no normalization

WH animate: who, whom, whoever, whomever: no normalization

I am not sure why whom, whomever, the independent possessives, and the reflexives aren't normalized to nominative as well.

There is one token where ’s in Let’s has been lemmatized as us (it should presumably be we for consistency).

That said, the simplest policy may be to use the lemma field only for spelling normalization (#513) and not perform case normalization at all. If the end user wants to map pronouns to nominative case, that is not hard to implement as postprocessing once spelling is consistent.

Thoughts?

nschneid commented 3 months ago

OK how about these guidelines: https://universaldependencies.org/en/pos/ADV.html

Implemented in EWT! (modulo some existing PronType=Int annotations that should be PronType=Rel)

AngledLuffa commented 3 months ago

So we should update none in PUD to be PRON with PronType=Neg?

(among other changes)

AngledLuffa commented 3 months ago

anything to be done for however? that was left out of the EWT updates

anyway?

any_ADV, there_PRON left blank?

nschneid commented 3 months ago

anything to be done for however? that was left out of the EWT updates

anyway?

These are both mainly discourse connectives, so I'm not sure they need a PronType.

any_ADV, there_PRON left blank?

there_PRON: for expletive "there" I'm not sure if any of the PronType values would be a good fit. This is documented at https://universaldependencies.org/en/pos/PRON.html#expletive-there

any_ADV: "any" is normally DET. I see "any/ADV longer/ADV" and similar; not sure this is actually correct. Also "it doesn't hurt any/ADV" (= at all). Could these be DET attaching as advmod? Feels related to "some/DET 540,000 men". Curious to hear @amir-zeldes's take when he's back from vacation.

AngledLuffa commented 3 months ago

however

mainly discourse connectives

Agreed that the discourse versions are fine w/o. They are not always discourse, though, especially however:

# sent_id = email-enronsent24_01-0036
# text = My goal, however optimistic, is to execute the risk policy by the end of today.
4       however however ADV     RB      _       5       advmod  5:advmod        _
5       optimistic      optimistic      ADJ     JJ      Degree=Pos      2       amod    2:amod  SpaceAfter=No

# sent_id = email-enronsent24_01-0093
# text = My goal, however optimistic, is to execute the risk policy by the end of today.

# sent_id = reviews-332105-0004
# text = I will reccommend his services however/whenever possible!
6       however however ADV     WRB     PronType=Int    3       advmod  3:advmod|9:advmod       SpaceAfter=No
7       /       /       SYM     SYM     _       8       cc      8:cc    SpaceAfter=No
8       whenever        whenever        ADV     WRB     PronType=Rel    6       conj    3:advmod|6:conj|9:advmod        _

(those are the only ones I saw for however)

nschneid commented 3 months ago

Technically you're right, the "however optimistic" ones should be PronType=Int. I suppose these are just uses of "however" that modify a non-predicate ADJ or ADV.

"however/whenever possible": as "however" is the first item in coordination I suppose it should be the head of the free relative

AngledLuffa commented 3 months ago

Technically you're right

(insert satisfied seal meme here)

nschneid commented 3 months ago

Aha, apparently "however" receives a different xpos: RB for the discourse connective use and WRB for the interrogative or relative use! (This is documented in the PTB tagging guidelines.) So we can require PronType conditional on that.