arthurdejong / python-stdnum

A Python library to provide functions to handle, parse and validate standard numbers.
https://arthurdejong.org/python-stdnum/
GNU Lesser General Public License v2.1
503 stars 211 forks source link

ValueError raised on specific input #96

Closed elgehelge closed 5 years ago

elgehelge commented 5 years ago

Not sure if you regard this as a bug or not.

When given input that:

  1. evaluates as a digit (input_string.isdigit())
  2. cannot be type casted to int (int(input_string))

...a ValueError is raised instead of a stdnum.exceptions.ValidationError.

Example:

from stdnum import isbn
isbn.validate('978-9024538²70')  # notice the superscript of 2 ('²')
arthurdejong commented 5 years ago

Thanks for reporting this. Yes I consider this a bug.

It turns out that str.isdigit() returns True for certain Unicode code points that represent numbers. It also turns out that it also returns True for non-ASCII digits that int() can handle such as int('᭓') == 3 which is not what I would expect for most numbers.

I'm looking into a nice way to solve this so that this raises a ValidationError:

isbn.validate('978-90245᭓ 8270')
elgehelge commented 5 years ago

Great job. Looking forward to next release! 👏

elgehelge commented 5 years ago

@arthurdejong Instead of the regex you might like this:

def isdigits(input):
    return str.isdigit(input) and str.isascii(input)

my guess is that it is faster, but I did not test

arthurdejong commented 5 years ago

Thanks. I hadn't really considered that. I did some performance tests and it turns out that number.isdigit() and number.isascii() is the fastest that seems to have the expected result. The biggest problem is that that function is not present on Python 2 :( so I'll stick with the regular expression solution for now.

code relative performance
number.isdigit() 1
all(x in '0123456789' for x in number) 6.3
all(x in digits_set for x in number) 6.3
bool(re.match(r'^[0-9]+$', number)) 7.7
bool(digits_re.match(number)) 3.4
number.isdigit() and number.isascii() 1.2
str.isdigit(number) and str.isascii(number) 1.6