UniversalDependencies / UD_English-GUM

Other
31 stars 4 forks source link

NumType=Ord tokens missing NumForm annotations #73

Closed rhdunn closed 9 months ago

rhdunn commented 9 months ago

The word forms (first, third, etc.) should use NumForm=Word and the number-based forms (1st, 3rd, etc.) should use NumForm=Combi.

Validation issues:

ERROR: Sentence GUM_academic_exposure-2 token 6 -- NumType=Ord should be paired with NumForm=Word for form 'second'
ERROR: Sentence GUM_academic_exposure-20 token 8 -- NumType=Ord should be paired with NumForm=Word for form 'first'
ERROR: Sentence GUM_academic_librarians-27 token 28 -- NumType=Ord should be paired with NumForm=Word for form 'first'
ERROR: Sentence GUM_bio_byron-6 token 17 -- NumType=Ord should be paired with NumForm=Word for form 'first'
ERROR: Sentence GUM_bio_byron-10 token 14 -- NumType=Ord should be paired with NumForm=Word for form 'first'
ERROR: Sentence GUM_bio_byron-12 token 11 -- NumType=Ord should be paired with NumForm=Combi for form '2nd'
ERROR: Sentence GUM_bio_byron-15 token 4 -- NumType=Ord should be paired with NumForm=Combi for form '2nd'
ERROR: Sentence GUM_bio_byron-19 token 30 -- NumType=Ord should be paired with NumForm=Word for form 'first'
ERROR: Sentence GUM_bio_emperor-22 token 38 -- NumType=Ord should be paired with NumForm=Combi for form '1st'
ERROR: Sentence GUM_bio_emperor-24 token 4 -- NumType=Ord should be paired with NumForm=Word for form 'first'
ERROR: Sentence GUM_bio_emperor-36 token 28 -- NumType=Ord should be paired with NumForm=Combi for form '12th'
ERROR: Sentence GUM_conversation_grounded-37 token 8 -- NumType=Ord should be paired with NumForm=Word for form 'first'
ERROR: Sentence GUM_conversation_risk-109 token 10 -- NumType=Ord should be paired with NumForm=Word for form 'first'
ERROR: Sentence GUM_fiction_beast-21 token 12 -- NumType=Ord should be paired with NumForm=Word for form 'sixth'
ERROR: Sentence GUM_fiction_beast-21 token 18 -- NumType=Ord should be paired with NumForm=Word for form 'first'
ERROR: Sentence GUM_fiction_lunre-9 token 9 -- NumType=Ord should be paired with NumForm=Word for form 'first'
ERROR: Sentence GUM_fiction_lunre-13 token 2 -- NumType=Ord should be paired with NumForm=Word for form 'first'
ERROR: Sentence GUM_speech_impeachment-3 token 12 -- NumType=Ord should be paired with NumForm=Word for form 'first'
ERROR: Sentence GUM_textbook_labor-6 token 1 -- NumType=Ord should be paired with NumForm=Word for form 'First'
ERROR: Sentence GUM_textbook_labor-15 token 1 -- NumType=Ord should be paired with NumForm=Word for form 'Second'
ERROR: Sentence GUM_textbook_labor-20 token 1 -- NumType=Ord should be paired with NumForm=Word for form 'Third'
ERROR: Sentence GUM_vlog_portland-3 token 25 -- NumType=Ord should be paired with NumForm=Word for form 'first'
ERROR: Sentence GUM_vlog_portland-4 token 7 -- NumType=Ord should be paired with NumForm=Word for form 'first'
ERROR: Sentence GUM_vlog_portland-8 token 3 -- NumType=Ord should be paired with NumForm=Word for form 'first'
ERROR: Sentence GUM_vlog_portland-37 token 3 -- NumType=Ord should be paired with NumForm=Word for form 'first'
ERROR: Sentence GUM_voyage_athens-5 token 2 -- NumType=Ord should be paired with NumForm=Word for form 'first'
ERROR: Sentence GUM_voyage_athens-8 token 3 -- NumType=Ord should be paired with NumForm=Combi for form '7th'
ERROR: Sentence GUM_voyage_athens-13 token 25 -- NumType=Ord should be paired with NumForm=Combi for form '19th'
ERROR: Sentence GUM_voyage_athens-26 token 2 -- NumType=Ord should be paired with NumForm=Combi for form '20th'
ERROR: Sentence GUM_voyage_athens-28 token 8 -- NumType=Ord should be paired with NumForm=Combi for form '19th'
ERROR: Sentence GUM_voyage_athens-34 token 14 -- NumType=Ord should be paired with NumForm=Combi for form '21st'
ERROR: Sentence GUM_voyage_coron-13 token 2 -- NumType=Ord should be paired with NumForm=Word for form 'first'
ERROR: Sentence GUM_voyage_coron-13 token 13 -- NumType=Ord should be paired with NumForm=Word for form 'second'
ERROR: Sentence GUM_whow_overalls-4 token 15 -- NumType=Ord should be paired with NumForm=Word for form 'first'
ERROR: Sentence GUM_academic_discrimination-42 token 1 -- NumType=Ord should be paired with NumForm=Word for form 'First'
ERROR: Sentence GUM_academic_discrimination-43 token 1 -- NumType=Ord should be paired with NumForm=Word for form 'Second'
ERROR: Sentence GUM_academic_discrimination-46 token 1 -- NumType=Ord should be paired with NumForm=Word for form 'First'
ERROR: Sentence GUM_academic_discrimination-47 token 1 -- NumType=Ord should be paired with NumForm=Word for form 'Second'
ERROR: Sentence GUM_academic_art-12 token 14 -- NumType=Ord should be paired with NumForm=Combi for form '17th'
ERROR: Sentence GUM_academic_art-27 token 11 -- NumType=Ord should be paired with NumForm=Word for form 'first'
ERROR: Sentence GUM_academic_art-27 token 43 -- NumType=Ord should be paired with NumForm=Combi for form '13th'
ERROR: Sentence GUM_academic_census-7 token 24 -- NumType=Ord should be paired with NumForm=Word for form 'first'
ERROR: Sentence GUM_academic_census-21 token 1 -- NumType=Ord should be paired with NumForm=Word for form 'First'
ERROR: Sentence GUM_academic_census-22 token 1 -- NumType=Ord should be paired with NumForm=Word for form 'Second'
ERROR: Sentence GUM_academic_census-34 token 2 -- NumType=Ord should be paired with NumForm=Word for form 'first'
ERROR: Sentence GUM_academic_economics-3 token 3 -- NumType=Ord should be paired with NumForm=Word for form 'second'
ERROR: Sentence GUM_academic_economics-3 token 7 -- NumType=Ord should be paired with NumForm=Combi for form '20th'
ERROR: Sentence GUM_academic_enjambment-15 token 40 -- NumType=Ord should be paired with NumForm=Combi for form '15th'
ERROR: Sentence GUM_academic_enjambment-15 token 43 -- NumType=Ord should be paired with NumForm=Combi for form '19th'
ERROR: Sentence GUM_academic_enjambment-17 token 1 -- NumType=Ord should be paired with NumForm=Word for form 'First'
ERROR: Sentence GUM_academic_enjambment-18 token 1 -- NumType=Ord should be paired with NumForm=Word for form 'Second'
ERROR: Sentence GUM_academic_enjambment-22 token 1 -- NumType=Ord should be paired with NumForm=Word for form 'First'
ERROR: Sentence GUM_academic_games-20 token 2 -- NumType=Ord should be paired with NumForm=Word for form 'first'
ERROR: Sentence GUM_academic_huh-10 token 5 -- NumType=Ord should be paired with NumForm=Word for form 'first'
ERROR: Sentence GUM_academic_huh-12 token 5 -- NumType=Ord should be paired with NumForm=Word for form 'second'
ERROR: Sentence GUM_academic_huh-23 token 10 -- NumType=Ord should be paired with NumForm=Word for form 'first'
ERROR: Sentence GUM_academic_huh-37 token 48 -- NumType=Ord should be paired with NumForm=Word for form 'first'
ERROR: Sentence GUM_academic_implicature-22 token 10 -- NumType=Ord should be paired with NumForm=Word for form 'first'
amir-zeldes commented 9 months ago

@nschneid are we doing NumForm for Ord and adding the Combi value for v2.13? I don't mind doing it but would rather do it in lockstep with EWT.

nschneid commented 9 months ago

Yes, UniversalDependencies/UD_English-EWT#458

amir-zeldes commented 9 months ago

OK, but is Combi only possible for NumType=Ord? Or are there other cases that need Combi?

nschneid commented 9 months ago

https://universaldependencies.org/u/feat/NumForm.html only gives the ordinal example. In principle we could extend this to NOUNs and PROPNs involving digits ("1970s"), but that seems like it would take some work to figure out.

amir-zeldes commented 9 months ago

OK, I'll limit it to Ord or now.