UniversalDependencies / UD_English-EWT

English data
Creative Commons Attribution Share Alike 4.0 International
197 stars 41 forks source link

NumType incorrectly assigned for German decimals #483

Closed rhdunn closed 7 months ago

rhdunn commented 7 months ago

These numbers should be NumType=Frac instead of NumType=Card:

ERROR: Sentence answers-20111108024148AAO8oFI_ans-0010 token 3 -- CD/NumForm=Digit/NumType=Card lemma '3.40' does not match cardinal-number applied to form '3,40', expected '340'
ERROR: Sentence answers-20111108024148AAO8oFI_ans-0010 token 9 -- CD/NumForm=Digit/NumType=Card lemma '7.5' does not match cardinal-number applied to form '7,5', expected '75'

The following should also have the lemma 3.:

ERROR: Sentence email-enronsent00_02-0032 token 10 -- CD/NumForm=Digit/NumType=Card lemma '3,' does not match cardinal-number applied to form '3,', expected '3'
nschneid commented 7 months ago

The third one is from an oddly spelled sentence:

I am expecting to pay something in the $3,to $5,000 range.

I guess the "000" got deleted (or was omitted to save space). Shouldn't the lemma be "3" as if it were "3 to 5 thousand"?

rhdunn commented 7 months ago

That works for me. I've only done a cursory analysis of these, so some of my assignments may be wrong.

rhdunn commented 7 months ago

In that case, it would also need a CorrectForm annotation.

nschneid commented 7 months ago

Hmm, it would be nonstandard to write "$3 to $5,000 range" as well. I'll mark it as a typo of "$3,000".

AngledLuffa commented 7 months ago

I wouldn't consider that a typo. At most it's a common colloquial way of expressing a range

On Sun, Dec 3, 2023, 12:22 PM Nathan Schneider @.***> wrote:

Hmm, it would be nonstandard to write "$3 to $5,000 range" as well. I'll mark it as a typo of "$3,000".

— Reply to this email directly, view it on GitHub https://github.com/UniversalDependencies/UD_English-EWT/issues/483#issuecomment-1837589243, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2AYWM6B4QE6J4DSAE7RKTYHTNPNAVCNFSM6AAAAABAFAML5WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMZXGU4DSMRUGM . You are receiving this because you are subscribed to this thread.Message ID: @.*** com>

nschneid commented 7 months ago

I don't think I've seen it before. Certainly the missing space is a typo.

AngledLuffa commented 7 months ago

I'm just thinking that ranges get expressed as 4 to 5 million, 4 to 5 thousand, etc pretty often in spoken English when trying to express possible ranges for the price of something

On Sun, Dec 3, 2023, 12:29 PM Nathan Schneider @.***> wrote:

I don't think I've seen it before. Certainly the missing space is a typo.

— Reply to this email directly, view it on GitHub https://github.com/UniversalDependencies/UD_English-EWT/issues/483#issuecomment-1837591165, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2AYWJBFQOII6FO4B6YP23YHTOLPAVCNFSM6AAAAABAFAML5WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMZXGU4TCMJWGU . You are receiving this because you commented.Message ID: @.***>

nschneid commented 7 months ago

Yes but not with a repeated dollar sign in text (that I know of)