UniversalDependencies / UD_English-EWT

English data
Creative Commons Attribution Share Alike 4.0 International
199 stars 42 forks source link

Promoted=Yes #370

Open nschneid opened 1 year ago

nschneid commented 1 year ago

I am experimenting with a MISC feature Promoted=Yes to indicate that an item (typically an AUX) is in an atypical function due to promotion. This way we can filter these tokens out from queries looking for errors.

amir-zeldes commented 1 year ago

Sounds interesting, but are you able to identify this automatically? I can imagine its easy for things like stranded prepositions or for parents of orphan, but other stuff might be tricky.

nschneid commented 1 year ago

If the trees were correct, then a lot of promoted function words could be found automatically. But right now many of the apparent promotions are incorrect in EWT, so I am applying the feature manually to signal where it is correct.

amir-zeldes commented 1 year ago

Cool, can you share the script? If it works well for GUM I can include this in the conllu build

nschneid commented 1 year ago

No script yet, but here is a query of potential copulas to be examined: http://universal.grew.fr/?custom=6346d6e341a86

amir-zeldes commented 1 year ago

OK, let me know if you're ready to test something and I can take a look!

nschneid commented 11 months ago

BTW some of the apparently anomalous promotions (and UPOS/deprel combinations) are due to missing words. If the omitted word is fairly clear from context I am noting it with the MissingWordsAfter feature (e.g. MissingWordsAfter=' for a plural possessive). Otherwise I am adding MissingWordAfter=Yes (e.g. if the verb is absent and several verbs are plausible).