akoumjian / datefinder

Find dates inside text using Python and get back datetime objects
http://datefinder.readthedocs.org/en/latest/
MIT License
634 stars 166 forks source link

"Thu" not recognised by regex #138

Open janto opened 3 years ago

janto commented 3 years ago

I am parsing dates from text including days of week that were generated with the common "%a" strftime format. This typically abbreviates days to 3 characters (well, at least in my locale).

In [1]: import datetime                                                                                                                                                                                            
In [2]: import datefinder 
In [3]: for n in range(7): 
    ...:     text = (datetime.datetime.now()+datetime.timedelta(days=n)).strftime("date is %a %Y-%m-%d %H:%M:%S") 
    ...:     result = list(datefinder.find_dates(text, index=1)) 
    ...:     print(result) 
    ...:     d, index = list(result)[0] 
    ...:     print(text[index[0]:index[1]])

which produces

[(datetime.datetime(2020, 11, 14, 18, 22, 56), (7, 31))]
 Sat 2020-11-14 18:22:56
[(datetime.datetime(2020, 11, 15, 18, 22, 56), (7, 31))]
 Sun 2020-11-15 18:22:56
[(datetime.datetime(2020, 11, 16, 18, 22, 56), (7, 31))]
 Mon 2020-11-16 18:22:56
[(datetime.datetime(2020, 11, 17, 18, 22, 56), (7, 31))]
 Tue 2020-11-17 18:22:56
[(datetime.datetime(2020, 11, 18, 18, 22, 56), (7, 31))]
 Wed 2020-11-18 18:22:56
[(datetime.datetime(2020, 11, 19, 18, 22, 56), (11, 31))]
 2020-11-19 18:22:56
[(datetime.datetime(2020, 11, 20, 18, 22, 56), (7, 31))]
 Fri 2020-11-20 18:22:56

note that Thu is missing.

I have not tested it, but I'm guessing this is due to "thu" not being in the following line: https://github.com/akoumjian/datefinder/blob/0b864955e1c80c03fca16ee9a81fbf774f17f362/datefinder/constants.py#L7

This is causing me problems since Thu remains in text after dates are removed.