charliermarsh / semantic

A Python library for extracting semantic information from text, such as dates and numbers.
MIT License
74 stars 20 forks source link

Lots of broken number parsing: semantic.numbers.NumberService().parse('Two Hundred and Twenty Two'.lower()) == 210 and more! #13

Open fake-name opened 8 years ago

fake-name commented 8 years ago

I'm running through a whole pile of inputs for the ASCII number parsing, and I've discovered a number of bugs:

'One Hundred and Eighty Eight' parses as: 110.0
'One Hundred and Eighty Five' parses as: 116.0
'One Hundred and Eighty Four' parses as: 120.0
'One Hundred and Eighty Nine' parses as: 108.88888888888889
'One Hundred and Eighty Seven' parses as: 111.42857142857143
'One Hundred and Eighty Six' parses as: 113.33333333333333
'One Hundred and Eighty Three' parses as: 126.66666666666667
'One Hundred and Eighty Two' parses as: 140.0
'One Hundred and Fifty Eight' parses as: 106.25
'One Hundred and Fifty Five' parses as: 110.0
'One Hundred and Fifty Four' parses as: 112.5
'One Hundred and Fifty Nine' parses as: 105.55555555555556
'One Hundred and Fifty Seven' parses as: 107.14285714285714
'One Hundred and Fifty Six' parses as: 108.33333333333333
'One Hundred and Fifty Three' parses as: 116.66666666666667
'One Hundred and Fifty Two' parses as: 125.0
'One Hundred and Forty Eight' parses as: 105.0
'One Hundred and Forty Five' parses as: 108.0
'One Hundred and Forty Four' parses as: 110.0
'One Hundred and Forty Nine' parses as: 104.44444444444444
'One Hundred and Forty Seven' parses as: 105.71428571428571
'One Hundred and Forty Six' parses as: 106.66666666666667
'One Hundred and Forty Three' parses as: 113.33333333333333
'One Hundred and Forty Two' parses as: 120.0
'One Hundred and Ninety Eight' parses as: 111.25
'One Hundred and Ninety Five' parses as: 118.0
'One Hundred and Ninety Four' parses as: 122.5
'One Hundred and Ninety Nine' parses as: 110.0
'One Hundred and Ninety Seven' parses as: 112.85714285714286
'One Hundred and Ninety Six' parses as: 115.0
'One Hundred and Ninety Three' parses as: 130.0
'One Hundred and Ninety Two' parses as: 145.0
'One Hundred and Seventy Eight' parses as: 108.75
'One Hundred and Seventy Four' parses as: 117.5
'One Hundred and Seventy Nine' parses as: 107.77777777777777
'One Hundred and Seventy Seven' parses as: 110.0
'One Hundred and Seventy Six' parses as: 111.66666666666667
'One Hundred and Seventy Three' parses as: 123.33333333333333
'One Hundred and Seventy Two' parses as: 135.0
'One Hundred and Sixty Eight' parses as: 107.5
'One Hundred and Sixty Five' parses as: 112.0
'One Hundred and Sixty Four' parses as: 115.0
'One Hundred and Sixty Nine' parses as: 106.66666666666667
'One Hundred and Sixty Seven' parses as: 108.57142857142857
'One Hundred and Sixty Six' parses as: 110.0
'One Hundred and Sixty Three' parses as: 120.0
'One Hundred and Sixty Two' parses as: 130.0
'One Hundred and Sixty' parses as: 160
'One Hundred and Thirty Eight' parses as: 103.75
'One Hundred and Thirty Five' parses as: 106.0
'One Hundred and Thirty Four' parses as: 107.5
'One Hundred and Thirty Nine' parses as: 103.33333333333333
'One Hundred and Thirty Seven' parses as: 104.28571428571429
'One Hundred and Thirty Six' parses as: 105.0
'One Hundred and Thirty Three' parses as: 110.0
'One Hundred and Thirty Two' parses as: 115.0
'One Hundred and Thirty' parses as: 130
'One Hundred and Twenty Eight' parses as: 102.5
'One Hundred and Twenty Five' parses as: 104.0
'One Hundred and Twenty Four' parses as: 105.0
'One Hundred and Twenty Nine' parses as: 102.22222222222223
'One Hundred and Twenty Seven' parses as: 102.85714285714286
'One Hundred and Twenty Six' parses as: 103.33333333333333
'One Hundred and Twenty Three' parses as: 106.66666666666667
'One Hundred and Twenty Two' parses as: 110.0
'Three Hundred and Twenty Eight' parses as: 302.5
'Three Hundred and Twenty Five' parses as: 304.0
'Three Hundred and Twenty Four' parses as: 305.0
'Three Hundred and Twenty Nine' parses as: 302.22222222222223
'Three Hundred and Twenty Seven' parses as: 302.85714285714283
'Three Hundred and Twenty Six' parses as: 303.3333333333333
'Three Hundred and Twenty Three' parses as: 306.6666666666667
'Three Hundred and Twenty Two' parses as: 310.0
'Three Hundred and Twenty' parses as: 320
'Two Hundred and Eight Six' parses as: 201.33333333333334
'Two Hundred and Eighty Eight' parses as: 210.0
'Two Hundred and Eighty Five' parses as: 216.0
'Two Hundred and Eighty Four' parses as: 220.0
'Two Hundred and Eighty Nine' parses as: 208.88888888888889
'Two Hundred and Eighty Seven' parses as: 211.42857142857142
'Two Hundred and Eighty Three' parses as: 226.66666666666666
'Two Hundred and Eighty Two' parses as: 240.0
'Two Hundred and Eighty' parses as: 280
'Two Hundred and Fifty Four' parses as: 212.5
'Two Hundred and Fifty Three' parses as: 216.66666666666666
'Two Hundred and Fifty Two' parses as: 225.0
'Two Hundred and Fifty' parses as: 250
'Two Hundred and Forty Eight' parses as: 205.0
'Two Hundred and Forty Five' parses as: 208.0
'Two Hundred and Forty Four' parses as: 210.0
'Two Hundred and Forty Nine' parses as: 204.44444444444446
'Two Hundred and Forty Seven' parses as: 205.71428571428572
'Two Hundred and Forty Six' parses as: 206.66666666666666
'Two Hundred and Forty Three' parses as: 213.33333333333334
'Two Hundred and Forty Two' parses as: 220.0
'Two Hundred and Forty' parses as: 240
'Two Hundred and Ninety Eight' parses as: 211.25
'Two Hundred and Ninety Five' parses as: 218.0
'Two Hundred and Ninety Four' parses as: 222.5
'Two Hundred and Ninety Nine' parses as: 210.0
'Two Hundred and Ninety Seven' parses as: 212.85714285714286
'Two Hundred and Ninety Six' parses as: 215.0
'Two Hundred and Ninety Three' parses as: 230.0
'Two Hundred and Ninety Two' parses as: 245.0
'Two Hundred and Seventy Nine' parses as: 207.77777777777777
'Two Hundred and Seventy Two' parses as: 235.0
'Two Hundred and Sixty Eight' parses as: 207.5
'Two Hundred and Sixty Five' parses as: 212.0
'Two Hundred and Sixty Nine' parses as: 206.66666666666666
'Two Hundred and Sixty Seven' parses as: 208.57142857142858
'Two Hundred and Sixty Six' parses as: 210.0
'Two Hundred and Thirty Eight' parses as: 203.75
'Two Hundred and Thirty Five' parses as: 206.0
'Two Hundred and Thirty Four' parses as: 207.5
'Two Hundred and Thirty Nine' parses as: 203.33333333333334
'Two Hundred and Thirty Seven' parses as: 204.28571428571428
'Two Hundred and Thirty Six' parses as: 205.0
'Two Hundred and Thirty Three' parses as: 210.0
'Two Hundred and Thirty' parses as: 230
'Two Hundred and Twenty Eight' parses as: 202.5
'Two Hundred and Twenty Five' parses as: 204.0
'Two Hundred and Twenty Four' parses as: 205.0
'Two Hundred and Twenty Nine' parses as: 202.22222222222223
'Two Hundred and Twenty Seven' parses as: 202.85714285714286
'Two Hundred and Twenty Six' parses as: 203.33333333333334
'Two Hundred and Twenty Three' parses as: 206.66666666666666
'Two Hundred and Twenty Two' parses as: 210.0
fake-name commented 8 years ago

If I reparse the numbers with parseInt(), it gets the results right (except for Two Hundred and Eight Six, which parses as 214 (208 + 6, but why the heck would you ever want it to automatically add like that?), for some reason), but all the numbers above are transparently integral values, I don't understand why parse() is treating them as fractional.

Splitter commented 7 years ago

Ive noticed this too. I moved to a new system and installed Semantic on it. And now many things are failing. I dont know if it caused by recent changes to Semantic or its dependencies but this library is unusable at this point. Beyond getting the wrong numbers, If you try MathService().parseEquation(equation) it will always raise an exception now because it attempts to parse all operators as numbers.