akoumjian / datefinder

Find dates inside text using Python and get back datetime objects
http://datefinder.readthedocs.org/en/latest/
MIT License
634 stars 166 forks source link

Datefind mistakenly identifies "pre-qualification may" as a date resulting in the date list as "on may" #187

Open MichelRobitaille opened 1 year ago

MichelRobitaille commented 1 year ago

In any sentence containing two words where the first word ends by "on" and the second word is contains the name of a month such as "may", the list of dates will contain "on may". Clearly "on may" is not a date. You may use the following sentence as test case: "On February 10, 2012, DHI Mortgage became aware that a software security breach by external sources had occurred in its Internet Loan Prequalification System. DHI Mortgage immediately isolated the affected server, purged certain affected files, and modified the electronic security measures. People who provided their information online for pre-qualification may have had their names, Social Security numbers, dates of birth, contact information, marital status, employment information, income, asset information, and liability information exposed." The list of dates will have: ['On February 10, 2012', 'on may', 'on, mar']

Clearly only the first date is correct and the last two are erroneously added.

Also is there a way to solve the following warning the datefinder is used at the import time using Python import datefinder

C:\Users\User\anaconda3\lib\site-packages\dateutil\parser_parser.py:1207: UnknownTimezoneWarning: tzname PDT identified but not understood. Pass tzinfos argument in order to correctly return a timezone-aware datetime. In a future version, this will raise an exception.

Thanks in advance. Regards,