MycroftAI / lingua-franca

Mycroft's multilingual text parsing and formatting library
Apache License 2.0
74 stars 77 forks source link

Number parsers unreliable in presence of "and" words #82

Open JuneStepp opened 4 years ago

JuneStepp commented 4 years ago

"Nine hundred and five" only returns "900". Fractions work fine though like "nine hundred and two tenths" which returns "900.2".

ChanceNCounter commented 4 years ago

Nice find. It looks like we're having the opposite problem in Spanish. I'm gonna use this issue to document all the related bugs, so we can write failing tests for the lot of them at once.

>>> extract_number("novecientos y cinco", lang="es")
905
>>> extract_number("novecientos cinco", lang="es")
5

maintainers: please check for similar bugs in your native languages! i only speak the two.


further diagnosis:

>>> extract_numbers("novecientos veinte y cinco", lang="es")
[905] # should be 925. consistent behavior with other bug would return [900, 25].

this snippet is a possible dupe or cousin of #86

ChanceNCounter commented 4 years ago

I'll have to triple-check, but I think this boils down to we don't handle that (grammatically incorrect, but colloquially constant) use of "and" yet. I vote willfix, just explaining.

Fractions work because that's the only thing "and" triggers at the moment. However, "two hundred and five" doesn't have a denominator, so the "five" is discarded as a separate number, which would be returned along with 900 in extract_numbers.