adbar / trafilatura

Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML
https://trafilatura.readthedocs.io
Apache License 2.0
3.58k stars 256 forks source link

PytzUsageWarning: localize method no longer necessary #271

Closed rwinterschlaf closed 1 year ago

rwinterschlaf commented 1 year ago
...\python37\lib\site-packages\dateparser\date_parser.py:35: PytzUsageWarning: The localize method is no longer necessary, as this time zone supports the fold attribute (PEP 495). For more details on migrating to a PEP 495-compliant implementation, see https://pytz-deprecation-shim.readthedocs.io/en/latest/migration.html
  date_obj = stz.localize(date_obj)

Received the above warning when trafilatura comes across https://www.badische-zeitung.de/nachrichten/wetter. It doesn't seem to be relevant on other websites and has no impact on the functionality, but is a bit annoying to see pop up from time to time.

adbar commented 1 year ago

Hi @rwinterschlaf, I also came across it, it's related to the dateparser module, updating it to version 1.1.2 should remove the warning: https://github.com/scrapinghub/dateparser/issues/1089

The latest htmldate version (package for date extraction) accounts for it too. Also, using the last dateparser version (1.1.4) could lead to more improvements but seems to be slower on my data, so use it at your own discretion.

rwinterschlaf commented 1 year ago

Wonderful, worked like a charm! I'll keep an eye on the speed - thank you :)