UniversalDependencies / UD_English-EWT

English data
Creative Commons Attribution Share Alike 4.0 International
197 stars 41 forks source link

Tense on modal auxes #450

Closed nschneid closed 2 weeks ago

nschneid commented 8 months ago

I notice that tense features are missing on xpos=MD tokens in both GUM and EWT. As I understand it, they're usually categorized thusly:

Present Past
can could
may might
shall should
will would
must
ought

Are Tense=Pres and Tense=Past not used because that would suggest the nonstandard lemmatizations could->can, might->may, etc.? Could we add the features without changing the lemmas?

(This is for the morphological category on the word. For clause-level features, it would make sense to mark clauses with will as Fut.)

amir-zeldes commented 8 months ago

I'm a bit ambivalent about this. Historically speaking, 'could' & co. in English are modal, and not past tense forms. Some Germanic languages, such as German, retain both the modal and the past tense form of these verbs, for example:

In English, we do not have all three forms, and the historically modal form takes over the past function as well. In hypothetical uses, it therefore seems a bit questionable to label "could" as "Past", at least for sentences like:

Meanwhile proper past use of "could" is rarer, for example habitual:

But I'm afraid we can't really disambiguate that automatically, so I would hesitate to just put Past on all uses of "could", even before getting into "will".

nschneid commented 2 weeks ago

Closing for lack of interest