adbar / htmldate

Fast and robust date extraction from web pages, with Python or on the command-line
https://htmldate.readthedocs.io
Apache License 2.0
117 stars 26 forks source link

LATEST_POSSIBLE max date can become outdated #70

Closed rolisz closed 1 year ago

rolisz commented 1 year ago

If you don't pass in a max_date, htmldate will use the LATEST_POSSIBLE constant (in get_max_date in validators.py). This constant is initialized to datetime.now().

This is an issue when htmldate is used in a long running process, such as a server which runs 24/7, or even in workers that are not restarted often. After one day of uptime, htmldate will still use the max_date of the previous day.

This can be overwritten from the calling code (by passing in the appropriate parameters), but I think it would be nicer if htmldate initialized max_date every time with datetime.now() in the get_max_date function.

adbar commented 1 year ago

Hi @rolisz, thanks for your feedback. Your remark makes perfect sense but as you say it's possible for the users to provide parameters with each function code, so I'm not sure. I'll leave the thread open and give it some thought.

adbar commented 1 year ago

Hi @rolisz, I just implemented your suggestion, it should be out soon in the next version.