UniversalDependencies / UD_English-GUM

Other
30 stars 4 forks source link

Incorrect values for NUM NumForm=Word tokens. #71

Closed rhdunn closed 10 months ago

rhdunn commented 10 months ago

I've identified the following incorrect form text for NumForm=Word on NUM tokens:

ERROR: Sentence GUM_vlog_radiology-16 -- unknown NUM NumForm=Word form '4:00'
ERROR: Sentence GUM_vlog_london-18 -- unknown NUM NumForm=Word form '6:00'
ERROR: Sentence GUM_academic_huh-19 -- unknown NUM NumForm=Word form '1:1:19'
ERROR: Sentence GUM_interview_messina-29 -- unknown NUM NumForm=Word form '9/11'
ERROR: Sentence GUM_interview_messina-32 -- unknown NUM NumForm=Word form '9/11'
ERROR: Sentence GUM_news_election-48 -- unknown NUM NumForm=Word form '22:30'
ERROR: Sentence GUM_news_election-48 -- unknown NUM NumForm=Word form '2:30'
ERROR: Sentence GUM_news_hackers-19 -- unknown NUM NumForm=Word form '15:11'
ERROR: Sentence GUM_news_imprisoned-14 -- unknown NUM NumForm=Word form '6:00am'
ERROR: Sentence GUM_speech_nixon-22 -- unknown NUM NumForm=Word form '1/2'
ERROR: Sentence GUM_speech_nixon-33 -- unknown NUM NumForm=Word form '1/2'
ERROR: Sentence GUM_speech_remarks-26 -- unknown NUM NumForm=Word form '1/2'
ERROR: Sentence GUM_speech_remarks-31 -- unknown NUM NumForm=Word form '1/2'
ERROR: Sentence GUM_speech_telescope-5 -- unknown NUM NumForm=Word form '6:13'
ERROR: Sentence GUM_textbook_spacetime-17 -- unknown NUM NumForm=Word form '6:23'
ERROR: Sentence GUM_vlog_appearance-44 -- unknown NUM NumForm=Word form '11:00'
ERROR: Sentence GUM_vlog_college-4 -- unknown NUM NumForm=Word form '8:15'
ERROR: Sentence GUM_vlog_college-57 -- unknown NUM NumForm=Word form '5:19'
ERROR: Sentence GUM_vlog_covid-13 -- unknown NUM NumForm=Word form '9:00'
ERROR: Sentence GUM_vlog_covid-28 -- unknown NUM NumForm=Word form '6:00'
ERROR: Sentence GUM_vlog_exams-3 -- unknown NUM NumForm=Word form '10:38'
ERROR: Sentence GUM_vlog_exams-4 -- unknown NUM NumForm=Word form '10:45'
ERROR: Sentence GUM_vlog_exams-19 -- unknown NUM NumForm=Word form '2:25'
ERROR: Sentence GUM_vlog_exams-21 -- unknown NUM NumForm=Word form '2:25'
ERROR: Sentence GUM_vlog_exams-42 -- unknown NUM NumForm=Word form '9:46'
ERROR: Sentence GUM_vlog_exams-45 -- unknown NUM NumForm=Word form '1:45'
ERROR: Sentence GUM_vlog_hiking-4 -- unknown NUM NumForm=Word form '7:15'
ERROR: Sentence GUM_vlog_wine-10 -- unknown NUM NumForm=Word form '10:00'
ERROR: Sentence GUM_voyage_fortlee-52 -- unknown NUM NumForm=Word form 'one of'
ERROR: Sentence GUM_voyage_fortlee-55 -- unknown NUM NumForm=Word form '08:30'
ERROR: Sentence GUM_voyage_fortlee-55 -- unknown NUM NumForm=Word form '08:30'
ERROR: Sentence GUM_voyage_fortlee-55 -- unknown NUM NumForm=Word form '06:30'
ERROR: Sentence GUM_voyage_fortlee-55 -- unknown NUM NumForm=Word form '03:00'
ERROR: Sentence GUM_voyage_merida-32 -- unknown NUM NumForm=Word form '08:00'
ERROR: Sentence GUM_voyage_merida-32 -- unknown NUM NumForm=Word form '20:00'
ERROR: Sentence GUM_voyage_merida-32 -- unknown NUM NumForm=Word form '08:00'
ERROR: Sentence GUM_voyage_merida-32 -- unknown NUM NumForm=Word form '14:00'
ERROR: Sentence GUM_voyage_merida-34 -- unknown NUM NumForm=Word form '08:00'
ERROR: Sentence GUM_voyage_merida-34 -- unknown NUM NumForm=Word form '20:00'
ERROR: Sentence GUM_voyage_phoenix-38 -- unknown NUM NumForm=Word form '+1602275-4958'
ERROR: Sentence GUM_voyage_phoenix-46 -- unknown NUM NumForm=Word form '#13'
ERROR: Sentence GUM_voyage_phoenix-47 -- unknown NUM NumForm=Word form '#1'
ERROR: Sentence GUM_voyage_phoenix-47 -- unknown NUM NumForm=Word form '#44'
ERROR: Sentence GUM_voyage_sydfynske-29 -- unknown NUM NumForm=Word form '+4550981306'
ERROR: Sentence GUM_voyage_sydfynske-40 -- unknown NUM NumForm=Word form '+4533330040'
ERROR: Sentence GUM_voyage_sydfynske-42 -- unknown NUM NumForm=Word form '+4588304520'
ERROR: Sentence GUM_voyage_tulsa-38 -- unknown NUM NumForm=Word form '+1918584-4428'
ERROR: Sentence GUM_whow_cupcakes-12 -- unknown NUM NumForm=Word form '1/2'
ERROR: Sentence GUM_whow_cupcakes-13 -- unknown NUM NumForm=Word form '3/4'
ERROR: Sentence GUM_whow_cupcakes-15 -- unknown NUM NumForm=Word form '1/2'
ERROR: Sentence GUM_whow_cupcakes-16 -- unknown NUM NumForm=Word form '1/2'
ERROR: Sentence GUM_whow_cupcakes-18 -- unknown NUM NumForm=Word form '1/2'
ERROR: Sentence GUM_whow_cupcakes-24 -- unknown NUM NumForm=Word form '3/4'
ERROR: Sentence GUM_whow_cupcakes-28 -- unknown NUM NumForm=Word form '3/4'
ERROR: Sentence GUM_whow_cupcakes-30 -- unknown NUM NumForm=Word form '2/3'
ERROR: Sentence GUM_whow_cupcakes-31 -- unknown NUM NumForm=Word form '1/4'
ERROR: Sentence GUM_whow_cupcakes-42 -- unknown NUM NumForm=Word form '1/2'
ERROR: Sentence GUM_whow_cupcakes-47 -- unknown NUM NumForm=Word form '1/4'
ERROR: Sentence GUM_whow_procrastinating-14 -- unknown NUM NumForm=Word form '12:30'
ERROR: Sentence GUM_whow_quinoa-11 -- unknown NUM NumForm=Word form '1/2'
ERROR: Sentence GUM_whow_quinoa-45 -- unknown NUM NumForm=Word form '1/2'

For GUM_voyage_fortlee-52, that is due to the CorrectForm having a space in it. My understanding of the typo documentation is that this should be an empty node and assigned the ellipsis dependency relation.

Most of these should use NumForm=Digit to be consistent with EWT.

The 6:00am form should use NumForm=Combi which is used for things like 1st and is defined to be "digits combined with a suffix".

amir-zeldes commented 10 months ago

Thanks for catching these. I will fix items with / or : to be Digits.

For the rest:

rhdunn commented 10 months ago

For NumForm=Combi, there are several instances of NumType=Ord in EWT for things like "1st" where that annotation would be useful. At the moment, EWT is not adding any NumForm feature for ordinals. I've not created an issue for that yet, but I'm planning on doing so.

nschneid commented 10 months ago

I'm happy to add features for EWT if it's straightforward to do so given existing annotations.

amir-zeldes commented 10 months ago

OK, digits is fixed for dates etc., let me know/open an issue if anyone is adding combi and I would match it.