akoumjian / datefinder

Find dates inside text using Python and get back datetime objects
http://datefinder.readthedocs.org/en/latest/
MIT License
634 stars 166 forks source link

REPLACEMENTS not comprehensive enough? #193

Open julianss opened 1 year ago

julianss commented 1 year ago

There are a lot of words that are recognized by the regex such as the "positionnal tokens", "extra tokens" and so on. But then when they have to be parsed by dateutils it fails. Take for example this date which isn't recognized when preceded by "last" but is recognized when preceded by "by".

In [126]: [x for x in datefinder.find_dates("last Mar-31-2023", source=True)]
Out[126]: []

In [127]: [x for x in datefinder.find_dates("by Mar-31-2023", source=True)]
Out[127]: [(datetime.datetime(2023, 3, 31, 0, 0), 'by Mar-31-2023')]

There is a REPLACEMENTS dict that strips problematic words. Shouldn't this dict be made more encompassing as to strip all the possible words that are recognized by the regex, or I am I missing something?