adbar / htmldate

Fast and robust date extraction from web pages, with Python or on the command-line
https://htmldate.readthedocs.io
Apache License 2.0
118 stars 26 forks source link

try_date_expr validation error #98

Closed arcombe012 closed 1 year ago

arcombe012 commented 1 year ago

in extractors.py the function try_date_expr() does not check that the string argument is of the right type and raises an exception if it is not.

To reproduce: when processing a meta node of the form

<meta itemprop="dateCreated" datetime="">

in examine_header(), the line

attempt = tryfunc(elem.get("datetime") or elem.get("content"))

calls try_date_expr() with string=None which raises AttributeError: 'NoneType' object has no attribute 'strip'

Suggestion: add in try_date_expr()

if not string:
  return None
adbar commented 1 year ago

Hi @arcombe012, it makes perfect sense, feel free to write a pull request if you wish to.