UniversalDependencies / UD_English-EWT

English data
Creative Commons Attribution Share Alike 4.0 International
201 stars 43 forks source link

Various NOUN, PROPN, and VERB feature fixes #362

Closed rhdunn closed 2 years ago

rhdunn commented 2 years ago

This adds various rules to the neatEN Python script to check for issues identified in #360, along with some others identified while doing this. These check:

  1. That subsets of UPenn XPOS map to corresponding UPOS tags.
  2. That specific features correspond to XPOS tags.

The identified errors have been corrected.

Note that this is not a complete list of feature checks.

nschneid commented 2 years ago

Excellent. I will get to this once some other updates are merged.

rhdunn commented 2 years ago

I've fixed a couple of issues, and extended this to more cases. I've not done a full set as some XPOS tags (RB, VBP, etc.) are missing features, so generate a lot of warnings/output. As such, I'm not intending on adding any additional rules/checks in this PR.

nschneid commented 2 years ago

I have merged the other updates. There are a few conflicts.

nschneid commented 2 years ago

Also note the new policy on possessive lemmas & features: https://github.com/UniversalDependencies/docs/issues/517#issuecomment-1272400316 Do you want to implement that or do you want me to? Either way, it does not affect XPOS.

rhdunn commented 2 years ago

With the merge, I've adopted your changes from the clefts branch when resolving the conflicts.

rhdunn commented 2 years ago

I've fixed the other issues you mentioned now.

rhdunn commented 2 years ago

I'll have a look at applying the changes from docs#517. I think it makes sense to do that as a separate PR/branch.

nschneid commented 2 years ago

@rhdunn Now that personal pronouns are updated, is it worth finishing this?

rhdunn commented 2 years ago

That's my intention, yes.

rhdunn commented 2 years ago

This is ready to merge now if you are happy with the fixes.