UniversalDependencies / UD_English-GUMReddit

Other
1 stars 2 forks source link

Incorrect part of speech for the given lemma #11

Closed rhdunn closed 12 months ago

rhdunn commented 12 months ago

The following parts of speech are incorrectly labelled:

ERROR: Sentence GUM_reddit_gender-1 token 1 -- VB lemma 'CMV' does not match lowercase-form applied to form 'CMV', expected 'cmv'
ERROR: Sentence GUM_reddit_gender-28 token 1 -- VB lemma 'CMV' does not match lowercase-form applied to form 'CMV', expected 'cmv'
ERROR: Sentence GUM_reddit_racial-3 token 3 -- NN lemma 'American' does not match lowercase-form applied to form 'American', expected 'american'
ERROR: Sentence GUM_reddit_space-6 token 5 -- NN lemma 'Martian' does not match lowercase-form applied to form 'Martian', expected 'martian'
ERROR: Sentence GUM_reddit_racial-9 token 62 -- NNP lemma 'Life' does not match capitalized-form applied to form 'Lives', expected 'Lives'

ADD

Web addresses are tagged as ADD in EWT, so should be here:

ERROR: Sentence GUM_reddit_ring-2 token 1 -- NNP lemma 'http://vocaroo.com/i/s1tDCvWpykHC' does not match capitalized-form applied to form 'http://vocaroo.com/i/s1tDCvWpykHC', expected 'Http://vocaroo.com/i/s1tdcvwpykhc'
ERROR: Sentence GUM_reddit_callout-44 token 1 -- NNP lemma 'https://xkcd.com/1205/' does not match capitalized-form applied to form 'https://xkcd.com/1205/', expected 'Https://xkcd.com/1205/'
ERROR: Sentence GUM_reddit_conspiracy-53 token 12 -- NNP lemma 'https://www.google.com/search?q=ostrich+skeleton&client=ms-android-verizon&prmd=isnv&source=lnms&tbm=isch&sa=X&ved=0ahUKEwj3qrmXudXcAhVuCDQIHaXvCnUQ_AUIESgB&biw=360&bih=560#imgrc=H_TL1bUwi9jryM' does not match capitalized-form applied to form 'https://www.google.com/search?q=ostrich+skeleton&client=ms-android-verizon&prmd=isnv&source=lnms&tbm=isch&sa=X&ved=0ahUKEwj3qrmXudXcAhVuCDQIHaXvCnUQ_AUIESgB&biw=360&bih=560#imgrc=H_TL1bUwi9jryM', expected 'Https://www.google.com/search?q=ostrich+skeleton&client=ms-android-verizon&prmd=isnv&source=lnms&tbm=isch&sa=x&ved=0ahukewj3qrmxudxcahvucdqihaxvcnuq_auiesgb&biw=360&bih=560#imgrc=h_tl1buwi9jrym'
ERROR: Sentence GUM_reddit_conspiracy-54 token 7 -- NNP lemma 'https://www.google.com/search?client=ms-android-verizon&biw=360&bih=310&tbm=isch&sa=1&ei=gq9mW7j0MqCT0PEPldifkAg&q=tyrannosaurus+rex+skeleton&oq=tyrannosaurus+rex+skele&gs_l=mobile-gws-wiz-img.1.0.0l5.2133.2740..3760...0.0..0.143.727.1j5......0....1.........0i67.16eeq_FMY8w#imgrc=D-fnseX2MxU_tM' does not match capitalized-form applied to form 'https://www.google.com/search?client=ms-android-verizon&biw=360&bih=310&tbm=isch&sa=1&ei=gq9mW7j0MqCT0PEPldifkAg&q=tyrannosaurus+rex+skeleton&oq=tyrannosaurus+rex+skele&gs_l=mobile-gws-wiz-img.1.0.0l5.2133.2740..3760...0.0..0.143.727.1j5......0....1.........0i67.16eeq_FMY8w#imgrc=D-fnseX2MxU_tM', expected 'Https://www.google.com/search?client=ms-android-verizon&biw=360&bih=310&tbm=isch&sa=1&ei=gq9mw7j0mqct0pepldifkag&q=tyrannosaurus+rex+skeleton&oq=tyrannosaurus+rex+skele&gs_l=mobile-gws-wiz-img.1.0.0l5.2133.2740..3760...0.0..0.143.727.1j5......0....1.........0i67.16eeq_fmy8w#imgrc=d-fnsex2mxu_tm'
ERROR: Sentence GUM_reddit_conspiracy-58 token 1 -- NNP lemma 'https://skeptics.stackexchange.com/questions/16369/is-t-rex-more-similar-to-sparrows-than-to-stegosaurus' does not match capitalized-form applied to form 'https://skeptics.stackexchange.com/questions/16369/is-t-rex-more-similar-to-sparrows-than-to-stegosaurus', expected 'Https://skeptics.stackexchange.com/questions/16369/is-t-rex-more-similar-to-sparrows-than-to-stegosaurus'
ERROR: Sentence GUM_reddit_space-51 token 1 -- NNP lemma 'https://m.youtube.com/watch?v=hlJmBe1eeQA' does not match capitalized-form applied to form 'https://m.youtube.com/watch?v=hlJmBe1eeQA', expected 'Https://m.youtube.com/watch?v=hljmbe1eeqa'

Additionally, the lemma for ADD is lowercased in that treebank AFAICT.

amir-zeldes commented 12 months ago

Thanks - CMV should be Abbr, will fix. The rest at the top are tagging or lemma errors, all easy fixes.

As for the bottom, GUM doesn't use ADD, and I'm pretty sure the lemma of something like http://vocaroo.com/i/s1tDCvWpykHC should not be http://vocaroo.com/i/s1tdcvwpykhc - it's like a mixed case name (and has xpos NNP in GUM)