adbar / htmldate

Fast and robust date extraction from web pages, with Python or on the command-line
https://htmldate.readthedocs.io
Apache License 2.0
117 stars 26 forks source link

Add date attributes to HTML extraction #73

Closed kernc closed 1 year ago

kernc commented 1 year ago

As seen in, e.g.: https://www.aljazeera.com/news/2023/3/8/nato-cautious-amid-ongoing-nord-stream-blasts-investigation

Without this change, the time portion of the article is not found.

adbar commented 1 year ago

Hi @kernc, thanks for the PR! I just made minor adjustments, is the PR ready to merge or do you have other attributes to add?

codecov-commenter commented 1 year ago

Codecov Report

Merging #73 (e434528) into master (5b9d47f) will not change coverage. The diff coverage is 100.00%.

:mega: This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@           Coverage Diff           @@
##           master      #73   +/-   ##
=======================================
  Coverage   98.19%   98.19%           
=======================================
  Files           8        8           
  Lines         942      942           
=======================================
  Hits          925      925           
  Misses         17       17           
Impacted Files Coverage Δ
htmldate/core.py 97.97% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

kernc commented 1 year ago

Thanks. I was deferring while resolving some other issue while using htmldate via trafilatura (👏👏). I don't have any others on my list at the moment, but will update when I do. Thanks!