AndyTheFactory / newspaper4k

📰 Newspaper4k a fork of the beloved Newspaper3k. Extraction of articles, titles, and metadata from news websites.
MIT License
437 stars 39 forks source link

Installation breaking due to lxml>=5.x #630

Open Abdullah0297445 opened 5 months ago

Abdullah0297445 commented 5 months ago

Describe the bug Trying to install newskpaper4k via pip. And getting the error:

ImportError: lxml.html.clean module is now a separate project lxml_html_clean.

To Reproduce Steps to reproduce the behavior, please post any code you used and the website you tried to parse/process:

  1. pip install newspaper4k
  2. See the following traceback:
[stderr] from newspaper import Article as NPArticle
[stderr] File "/usr/local/lib/python3.11/site-packages/newspaper/__init__.py", line 17, in <module>
[stderr] from .api import (
[stderr] File "/usr/local/lib/python3.11/site-packages/newspaper/api.py", line 8, in <module>
[stderr] from .article import Article
[stderr] File "/usr/local/lib/python3.11/site-packages/newspaper/article.py", line 21, in <module>
[stderr] from . import network
[stderr] File "/usr/local/lib/python3.11/site-packages/newspaper/network.py", line 15, in <module>
[stderr] from newspaper import parsers
[stderr] File "/usr/local/lib/python3.11/site-packages/newspaper/parsers.py", line 18, in <module>
[stderr] import lxml.html.clean
[stderr] File "/usr/local/lib/python3.11/site-packages/lxml/html/clean.py", line 18, in <module>
[stderr] raise ImportError(
[stderr] ImportError: lxml.html.clean module is now a separate project lxml_html_clean.
[stderr] Install lxml[html_clean] or lxml_html_clean directly.

Expected behavior Installation via pip should've worked.

System information

Workaround Anyone who's having this issue, for now just add lxml[html_clean]==5.2.0 in your requirements.txt file.

Quickfix To quickly fix the issue in this repo, for now we can edit this line in pyproject,toml file and pin the version of lxml below 5.x: https://github.com/AndyTheFactory/newspaper4k/blob/b5b20976bd320f89ffa25b8d4a7a94d190ee549a/pyproject.toml#L34C3-L34C15

RomanAverin commented 4 months ago

Same issue

carter-0 commented 4 months ago

I'm also experiencing this on macOS, Python 3.9. Patching the pyproject.toml gets it working for now.

Didou09 commented 1 month ago

same issue too