akoumjian / datefinder

Find dates inside text using Python and get back datetime objects
http://datefinder.readthedocs.org/en/latest/
MIT License
634 stars 166 forks source link

Crash on a string containing a Unicode superscript digit in the middle part of the date #156

Open Similacrest opened 3 years ago

Similacrest commented 3 years ago

datefinder==0.7.1

>>> [d for d in datefinder.find_dates("2021-0²-12")] 
Traceback (most recent call last):
  File "\lib\site-packages\dateutil\parser\_parser.py", line 655, in parse
    ret = self._build_naive(res, default)
  File "\lib\site-packages\dateutil\parser\_parser.py", line 1238, in _build_naive
    if cday > monthrange(cyear, cmonth)[1]:
  File "\lib\calendar.py", line 124, in monthrange
    raise IllegalMonthError(month)
calendar.IllegalMonthError: bad month number 0; must be 1-12

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 1, in <listcomp>
  File "\lib\site-packages\datefinder\__init__.py", line 32, in find_dates
    as_dt = self.parse_date_string(date_string, captures)
  File "\lib\site-packages\datefinder\__init__.py", line 102, in parse_date_string
    as_dt = parser.parse(date_string, default=self.base_date)
  File "\lib\site-packages\dateutil\parser\_parser.py", line 1374, in parse
    return DEFAULTPARSER.parse(timestr, **kwargs)
  File "\lib\site-packages\dateutil\parser\_parser.py", line 657, in parse
    six.raise_from(ParserError(e.args[0] + ": %s", timestr), e)
TypeError: unsupported operand type(s) for +: 'int' and 'str'

My understanding is that it's related to either str.isdigit() or regex '\d' including more than just 0-9. Indeed, this also happens with Kharosthi numerals mentioned in the str.isdigit() documentation