UniversalDependencies / UD_English-EWT

English data
Creative Commons Attribution Share Alike 4.0 International
197 stars 41 forks source link

fix punctuation #448

Closed martinpopel closed 9 months ago

martinpopel commented 9 months ago

UD_EWT does not follow the UD guidelines on punctuation attachment. This causes many problems such as incompatibility with other English UD trebanks (e.g. GUM follows the guidelines) or users of parsers trained on UD_EWT complaining about non-projective punct attachments.

This is my attempt to fix the errors in UD_EWT using ud.FixPunct. This PR includes also the script which applies the Udapi block and creates HTML diff files for easier checking of the edits in this PR.

martinpopel commented 9 months ago

Please ignore this PR (I have accidentally merged it by pushing to upstream instead to my fork, but I have reverted this). See the new PR with one number higher. I am sorry for the confusion.