UniversalDependencies / UD_English-GUM

Other
32 stars 4 forks source link

11 sentences which are just "Understand" #79

Closed AngledLuffa closed 8 months ago

AngledLuffa commented 10 months ago

The training dataset has 11 sentences which are each just one word, Understand. If there's any room to filter things like that, it would be useful to limit it to make just one (or even zero, since normally I would expect Understood or Understand?)

nschneid commented 10 months ago

It looks like it's a heading in a document template used in one of the GUM genres. GUM annotates full discourse structure so it makes sense to include headings.

amir-zeldes commented 10 months ago

Exactly, one of the genres is travel guides from Wikivoyage, and it's part of their basic template. They don't all have all sections, but some of the suggested sections are "See", "Do", "Understand" and "Get in", so many of those have just one word. It does represent the original text, so it should stay that way (and quantitatively these sentences are pretty negligible)