UniversalDependencies / UD_English-EWT

English data
Creative Commons Attribution Share Alike 4.0 International
197 stars 41 forks source link

implement ExtPos for fixed expressions #530

Open nschneid opened 1 month ago

nschneid commented 1 month ago

The Core Group decided it would be a good idea for treebanks to specify how each fixed expression functions externally via ExtPos in the MISC column.

This is already implemented for a few expressions in EWT. We might as well expand to all of them. If the external deprel is correct, it can be used to infer the ExtPos (which is one of ADV, ADP, CCONJ SCONJ).

AngledLuffa commented 1 month ago

Can you give a bit more explanation on what ExpPos means in this case or how the external deprel will be represented?

nschneid commented 1 month ago

External POS: https://universaldependencies.org/en/feat/ExtPos.html

For example, "instead" is individually an ADV, but where it attaches as mark, it is due to the fixed expression "instead of" acting as SCONJ. So in those cases it would receive ExtPos=SCONJ.

amir-zeldes commented 1 month ago

(BTW this has also been implemented in GUM)

bguil commented 1 month ago

BTW2: In v2.14, most of the treebanks that use ExtPos put the ExtPos feature in the FEATS column. This includes the SUD native corpora, English-EWT, UD_Portuguese-Bosque and UD_Portuguese-GSD.

For consistency, it would be nice to have the same policy in others such as English-GUM.

nschneid commented 1 month ago

For EWT I've just moved it to MISC following @dan-zeman's statement that FEATS should be reserved for properties of individual words, not larger units.

amir-zeldes commented 1 month ago

Yes, it's in MISC in GUM for the same reason.

nschneid commented 1 month ago

New issue about standardizing ExtPos at the universal level: UniversalDependencies/docs#1037

nschneid commented 1 month ago

Implemented in the above commit.

I've made some small updates to the English fixed docs: see #317.

One question:

nschneid commented 1 month ago

I think "as opposed to" is like "rather than"—its coordination vs. prepositional function depends on context.