akoumjian / datefinder

Find dates inside text using Python and get back datetime objects
http://datefinder.readthedocs.org/en/latest/
MIT License
634 stars 166 forks source link

Doesn't Understand June 1950 #167

Closed NRHGDW closed 2 years ago

NRHGDW commented 2 years ago

Input:

import datefinder
text = "June 1950"
print(text)
for x in datefinder.find_dates(text, source=True):
    text = text.replace(x[1], f'{x[0].year} {x[0].month} {x[0].day}')
print(text)

Output:

June 1950
1950 6 18
akoumjian commented 2 years ago

This is just an unfortunate side effect of how dateutil.parser works: https://dateutil.readthedocs.io/en/stable/parser.html

Essentially, dateutil does not simply assume the first of the month when you pass it just a month and a year. It will use a default date to "fill in the gaps" for any incomplete date string and that value has its own default value of today. I would personally prefer to always get the first of the month, but it's not semantically more meaningful (again, because you are getting a date, not a month object).

You can pass in your own base_date to datefinder. In this case, if you pick literally any date that has the first of the month, your month/year combinations will work as I'm assuming you expect. However, if you get dates with missing years, it will select whatever year that is, etc.


In [10]: text = "June 1950"
    ...: print(text)
    ...: for result, text in datefinder.find_dates(text, source=True, base_date=datetime(2022, 7, 1, 1, 1, 1, 1)):
    ...:     print(result.year, result.month, result.day)
    ...:
June 1950
1950 6 1