fhamborg / news-please

news-please - an integrated web crawler and information extractor for news that just works
Apache License 2.0
1.99k stars 414 forks source link

Failed to build for python 3.11 #237

Closed mattiasrubenson closed 1 year ago

mattiasrubenson commented 1 year ago

Mandatory

Related issues:

Describe the bug I fail to install news-please using pip and python 3.11. The reason seems to be that cchardet can't be built for python 3.11. From my short search there seems to be a working fork (https://github.com/faust-streaming/cChardet) that could be used instead.

(venv) mattiasrubenson@zagwe ~ [1]> pip install news-please
Collecting news-please
  Using cached news_please-1.5.22-py3-none-any.whl (89 kB)
Requirement already satisfied: Scrapy>=1.1.0 in ./PycharmProjects/tldr/venv/lib/python3.11/site-packages (from news-please) (2.7.1)
Requirement already satisfied: PyMySQL>=0.7.9 in ./PycharmProjects/tldr/venv/lib/python3.11/site-packages (from news-please) (1.0.2)
Requirement already satisfied: psycopg2-binary>=2.8.4 in ./PycharmProjects/tldr/venv/lib64/python3.11/site-packages (from news-please) (2.9.5)
Requirement already satisfied: hjson>=1.5.8 in ./PycharmProjects/tldr/venv/lib/python3.11/site-packages (from news-please) (3.1.0)
Requirement already satisfied: elasticsearch>=2.4 in ./PycharmProjects/tldr/venv/lib/python3.11/site-packages (from news-please) (8.6.0)
Requirement already satisfied: beautifulsoup4>=4.3.2 in ./PycharmProjects/tldr/venv/lib/python3.11/site-packages (from news-please) (4.11.1)
Requirement already satisfied: readability-lxml>=0.6.2 in ./PycharmProjects/tldr/venv/lib/python3.11/site-packages (from news-please) (0.8.1)
Requirement already satisfied: newspaper3k>=0.2.8 in ./PycharmProjects/tldr/venv/lib/python3.11/site-packages (from news-please) (0.2.8)
Requirement already satisfied: langdetect>=1.0.7 in ./PycharmProjects/tldr/venv/lib/python3.11/site-packages (from news-please) (1.0.9)
Requirement already satisfied: python-dateutil>=2.4.0 in ./PycharmProjects/tldr/venv/lib/python3.11/site-packages (from news-please) (2.8.2)
Requirement already satisfied: plac>=0.9.6 in ./PycharmProjects/tldr/venv/lib/python3.11/site-packages (from news-please) (1.3.5)
Requirement already satisfied: dotmap>=1.2.17 in ./PycharmProjects/tldr/venv/lib/python3.11/site-packages (from news-please) (1.3.30)
Requirement already satisfied: PyDispatcher>=2.0.5 in ./PycharmProjects/tldr/venv/lib/python3.11/site-packages (from news-please) (2.0.6)
Requirement already satisfied: warcio>=1.3.3 in ./PycharmProjects/tldr/venv/lib/python3.11/site-packages (from news-please) (1.7.4)
Collecting ago>=0.0.9
  Using cached ago-0.0.95-py3-none-any.whl
Requirement already satisfied: six>=1.10.0 in ./PycharmProjects/tldr/venv/lib/python3.11/site-packages (from news-please) (1.16.0)
Requirement already satisfied: lxml>=3.3.5 in ./PycharmProjects/tldr/venv/lib64/python3.11/site-packages (from news-please) (4.9.2)
Collecting awscli>=1.11.117
  Using cached awscli-1.27.50-py3-none-any.whl (4.0 MB)
Requirement already satisfied: hurry.filesize>=0.9 in ./PycharmProjects/tldr/venv/lib/python3.11/site-packages (from news-please) (0.9)
Collecting bs4
  Using cached bs4-0.0.1-py3-none-any.whl
Collecting cchardet>=2.1.7
  Using cached cchardet-2.1.7.tar.gz (653 kB)
  Preparing metadata (setup.py) ... done
Requirement already satisfied: botocore==1.29.50 in ./PycharmProjects/tldr/venv/lib/python3.11/site-packages (from awscli>=1.11.117->news-please) (1.29.50)
Requirement already satisfied: docutils<0.17,>=0.10 in ./PycharmProjects/tldr/venv/lib/python3.11/site-packages (from awscli>=1.11.117->news-please) (0.16)
Requirement already satisfied: s3transfer<0.7.0,>=0.6.0 in ./PycharmProjects/tldr/venv/lib/python3.11/site-packages (from awscli>=1.11.117->news-please) (0.6.0)
Requirement already satisfied: PyYAML<5.5,>=3.10 in ./PycharmProjects/tldr/venv/lib64/python3.11/site-packages (from awscli>=1.11.117->news-please) (5.4.1)
Requirement already satisfied: colorama<0.4.5,>=0.2.5 in ./PycharmProjects/tldr/venv/lib/python3.11/site-packages (from awscli>=1.11.117->news-please) (0.4.4)
Requirement already satisfied: rsa<4.8,>=3.1.2 in ./PycharmProjects/tldr/venv/lib/python3.11/site-packages (from awscli>=1.11.117->news-please) (4.7.2)
Requirement already satisfied: jmespath<2.0.0,>=0.7.1 in ./PycharmProjects/tldr/venv/lib/python3.11/site-packages (from botocore==1.29.50->awscli>=1.11.117->news-please) (1.0.1)
Requirement already satisfied: urllib3<1.27,>=1.25.4 in ./PycharmProjects/tldr/venv/lib/python3.11/site-packages (from botocore==1.29.50->awscli>=1.11.117->news-please) (1.26.13)
Requirement already satisfied: soupsieve>1.2 in ./PycharmProjects/tldr/venv/lib/python3.11/site-packages (from beautifulsoup4>=4.3.2->news-please) (2.3.2.post1)
Requirement already satisfied: elastic-transport<9,>=8 in ./PycharmProjects/tldr/venv/lib/python3.11/site-packages (from elasticsearch>=2.4->news-please) (8.4.0)
Requirement already satisfied: setuptools in ./PycharmProjects/tldr/venv/lib/python3.11/site-packages (from hurry.filesize>=0.9->news-please) (60.2.0)
Requirement already satisfied: Pillow>=3.3.0 in ./PycharmProjects/tldr/venv/lib64/python3.11/site-packages (from newspaper3k>=0.2.8->news-please) (9.4.0)
Requirement already satisfied: cssselect>=0.9.2 in ./PycharmProjects/tldr/venv/lib/python3.11/site-packages (from newspaper3k>=0.2.8->news-please) (1.2.0)
Requirement already satisfied: nltk>=3.2.1 in ./PycharmProjects/tldr/venv/lib/python3.11/site-packages (from newspaper3k>=0.2.8->news-please) (3.8.1)
Requirement already satisfied: requests>=2.10.0 in ./PycharmProjects/tldr/venv/lib/python3.11/site-packages (from newspaper3k>=0.2.8->news-please) (2.28.1)
Requirement already satisfied: feedparser>=5.2.1 in ./PycharmProjects/tldr/venv/lib/python3.11/site-packages (from newspaper3k>=0.2.8->news-please) (6.0.10)
Requirement already satisfied: tldextract>=2.0.1 in ./PycharmProjects/tldr/venv/lib/python3.11/site-packages (from newspaper3k>=0.2.8->news-please) (3.4.0)
Requirement already satisfied: feedfinder2>=0.0.4 in ./PycharmProjects/tldr/venv/lib/python3.11/site-packages (from newspaper3k>=0.2.8->news-please) (0.0.4)
Requirement already satisfied: jieba3k>=0.35.1 in ./PycharmProjects/tldr/venv/lib/python3.11/site-packages (from newspaper3k>=0.2.8->news-please) (0.35.1)
Requirement already satisfied: tinysegmenter==0.3 in ./PycharmProjects/tldr/venv/lib/python3.11/site-packages (from newspaper3k>=0.2.8->news-please) (0.3)
Requirement already satisfied: chardet in ./PycharmProjects/tldr/venv/lib/python3.11/site-packages (from readability-lxml>=0.6.2->news-please) (5.1.0)
Requirement already satisfied: Twisted>=18.9.0 in ./PycharmProjects/tldr/venv/lib/python3.11/site-packages (from Scrapy>=1.1.0->news-please) (22.10.0)
Requirement already satisfied: cryptography>=3.3 in ./PycharmProjects/tldr/venv/lib64/python3.11/site-packages (from Scrapy>=1.1.0->news-please) (39.0.0)
Requirement already satisfied: itemloaders>=1.0.1 in ./PycharmProjects/tldr/venv/lib/python3.11/site-packages (from Scrapy>=1.1.0->news-please) (1.0.6)
Requirement already satisfied: parsel>=1.5.0 in ./PycharmProjects/tldr/venv/lib/python3.11/site-packages (from Scrapy>=1.1.0->news-please) (1.7.0)
Requirement already satisfied: pyOpenSSL>=21.0.0 in ./PycharmProjects/tldr/venv/lib/python3.11/site-packages (from Scrapy>=1.1.0->news-please) (23.0.0)
Requirement already satisfied: queuelib>=1.4.2 in ./PycharmProjects/tldr/venv/lib/python3.11/site-packages (from Scrapy>=1.1.0->news-please) (1.6.2)
Requirement already satisfied: service-identity>=18.1.0 in ./PycharmProjects/tldr/venv/lib/python3.11/site-packages (from Scrapy>=1.1.0->news-please) (21.1.0)
Requirement already satisfied: w3lib>=1.17.0 in ./PycharmProjects/tldr/venv/lib/python3.11/site-packages (from Scrapy>=1.1.0->news-please) (2.1.1)
Requirement already satisfied: zope.interface>=5.1.0 in ./PycharmProjects/tldr/venv/lib64/python3.11/site-packages (from Scrapy>=1.1.0->news-please) (5.5.2)
Requirement already satisfied: protego>=0.1.15 in ./PycharmProjects/tldr/venv/lib/python3.11/site-packages (from Scrapy>=1.1.0->news-please) (0.2.1)
Requirement already satisfied: itemadapter>=0.1.0 in ./PycharmProjects/tldr/venv/lib/python3.11/site-packages (from Scrapy>=1.1.0->news-please) (0.7.0)
Requirement already satisfied: packaging in ./PycharmProjects/tldr/venv/lib/python3.11/site-packages (from Scrapy>=1.1.0->news-please) (23.0)
Requirement already satisfied: cffi>=1.12 in ./PycharmProjects/tldr/venv/lib64/python3.11/site-packages (from cryptography>=3.3->Scrapy>=1.1.0->news-please) (1.15.1)
Requirement already satisfied: certifi in ./PycharmProjects/tldr/venv/lib/python3.11/site-packages (from elastic-transport<9,>=8->elasticsearch>=2.4->news-please) (2022.12.7)
Requirement already satisfied: sgmllib3k in ./PycharmProjects/tldr/venv/lib/python3.11/site-packages (from feedparser>=5.2.1->newspaper3k>=0.2.8->news-please) (1.0.0)
Requirement already satisfied: click in ./PycharmProjects/tldr/venv/lib/python3.11/site-packages (from nltk>=3.2.1->newspaper3k>=0.2.8->news-please) (8.1.3)
Requirement already satisfied: joblib in ./PycharmProjects/tldr/venv/lib/python3.11/site-packages (from nltk>=3.2.1->newspaper3k>=0.2.8->news-please) (1.2.0)
Requirement already satisfied: regex>=2021.8.3 in ./PycharmProjects/tldr/venv/lib64/python3.11/site-packages (from nltk>=3.2.1->newspaper3k>=0.2.8->news-please) (2022.10.31)
Requirement already satisfied: tqdm in ./PycharmProjects/tldr/venv/lib/python3.11/site-packages (from nltk>=3.2.1->newspaper3k>=0.2.8->news-please) (4.64.1)
Requirement already satisfied: charset-normalizer<3,>=2 in ./PycharmProjects/tldr/venv/lib/python3.11/site-packages (from requests>=2.10.0->newspaper3k>=0.2.8->news-please) (2.1.1)
Requirement already satisfied: idna<4,>=2.5 in ./PycharmProjects/tldr/venv/lib/python3.11/site-packages (from requests>=2.10.0->newspaper3k>=0.2.8->news-please) (3.4)
Requirement already satisfied: pyasn1>=0.1.3 in ./PycharmProjects/tldr/venv/lib/python3.11/site-packages (from rsa<4.8,>=3.1.2->awscli>=1.11.117->news-please) (0.4.8)
Requirement already satisfied: attrs>=19.1.0 in ./PycharmProjects/tldr/venv/lib/python3.11/site-packages (from service-identity>=18.1.0->Scrapy>=1.1.0->news-please) (22.2.0)
Requirement already satisfied: pyasn1-modules in ./PycharmProjects/tldr/venv/lib/python3.11/site-packages (from service-identity>=18.1.0->Scrapy>=1.1.0->news-please) (0.2.8)
Requirement already satisfied: requests-file>=1.4 in ./PycharmProjects/tldr/venv/lib/python3.11/site-packages (from tldextract>=2.0.1->newspaper3k>=0.2.8->news-please) (1.5.1)
Requirement already satisfied: filelock>=3.0.8 in ./PycharmProjects/tldr/venv/lib/python3.11/site-packages (from tldextract>=2.0.1->newspaper3k>=0.2.8->news-please) (3.9.0)
Requirement already satisfied: constantly>=15.1 in ./PycharmProjects/tldr/venv/lib/python3.11/site-packages (from Twisted>=18.9.0->Scrapy>=1.1.0->news-please) (15.1.0)
Requirement already satisfied: incremental>=21.3.0 in ./PycharmProjects/tldr/venv/lib/python3.11/site-packages (from Twisted>=18.9.0->Scrapy>=1.1.0->news-please) (22.10.0)
Requirement already satisfied: Automat>=0.8.0 in ./PycharmProjects/tldr/venv/lib/python3.11/site-packages (from Twisted>=18.9.0->Scrapy>=1.1.0->news-please) (22.10.0)
Requirement already satisfied: hyperlink>=17.1.1 in ./PycharmProjects/tldr/venv/lib/python3.11/site-packages (from Twisted>=18.9.0->Scrapy>=1.1.0->news-please) (21.0.0)
Requirement already satisfied: typing-extensions>=3.6.5 in ./PycharmProjects/tldr/venv/lib/python3.11/site-packages (from Twisted>=18.9.0->Scrapy>=1.1.0->news-please) (4.4.0)
Requirement already satisfied: pycparser in ./PycharmProjects/tldr/venv/lib/python3.11/site-packages (from cffi>=1.12->cryptography>=3.3->Scrapy>=1.1.0->news-please) (2.21)
Building wheels for collected packages: cchardet
  Building wheel for cchardet (setup.py) ... error
  error: subprocess-exited-with-error

  × python setup.py bdist_wheel did not run successfully.
  │ exit code: 1
  ╰─> [22 lines of output]
      running bdist_wheel
      running build
      running build_py
      creating build
      creating build/lib.linux-x86_64-3.11
      creating build/lib.linux-x86_64-3.11/cchardet
      copying src/cchardet/version.py -> build/lib.linux-x86_64-3.11/cchardet
      copying src/cchardet/__init__.py -> build/lib.linux-x86_64-3.11/cchardet
      running build_ext
      creating build/temp.linux-x86_64-3.11
      creating build/temp.linux-x86_64-3.11/src
      creating build/temp.linux-x86_64-3.11/src/cchardet
      creating build/temp.linux-x86_64-3.11/src/ext
      creating build/temp.linux-x86_64-3.11/src/ext/uchardet
      creating build/temp.linux-x86_64-3.11/src/ext/uchardet/src
      creating build/temp.linux-x86_64-3.11/src/ext/uchardet/src/LangModels
      gcc -Wsign-compare -DDYNAMIC_ANNOTATIONS_ENABLED=1 -DNDEBUG -O2 -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fstack-protector-strong -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -D_GNU_SOURCE -fPIC -fwrapv -O2 -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fstack-protector-strong -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -D_GNU_SOURCE -fPIC -fwrapv -O2 -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fstack-protector-strong -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -D_GNU_SOURCE -fPIC -fwrapv -fPIC -Isrc/ext/uchardet/src -I/home/mattiasrubenson/PycharmProjects/tldr/venv/include -I/usr/include/python3.11 -c src/cchardet/_cchardet.cpp -o build/temp.linux-x86_64-3.11/src/cchardet/_cchardet.o
      src/cchardet/_cchardet.cpp:196:12: fatal error: longintrepr.h: No such file or directory
        196 |   #include "longintrepr.h"
            |            ^~~~~~~~~~~~~~~
      compilation terminated.
      error: command '/usr/bin/gcc' failed with exit code 1
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for cchardet
  Running setup.py clean for cchardet
Failed to build cchardet
Installing collected packages: cchardet, ago, bs4, awscli, news-please
  Running setup.py install for cchardet ... error
  error: subprocess-exited-with-error

  × Running setup.py install for cchardet did not run successfully.
  │ exit code: 1
  ╰─> [24 lines of output]
      running install
      /home/mattiasrubenson/PycharmProjects/tldr/venv/lib/python3.11/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
        warnings.warn(
      running build
      running build_py
      creating build
      creating build/lib.linux-x86_64-3.11
      creating build/lib.linux-x86_64-3.11/cchardet
      copying src/cchardet/version.py -> build/lib.linux-x86_64-3.11/cchardet
      copying src/cchardet/__init__.py -> build/lib.linux-x86_64-3.11/cchardet
      running build_ext
      creating build/temp.linux-x86_64-3.11
      creating build/temp.linux-x86_64-3.11/src
      creating build/temp.linux-x86_64-3.11/src/cchardet
      creating build/temp.linux-x86_64-3.11/src/ext
      creating build/temp.linux-x86_64-3.11/src/ext/uchardet
      creating build/temp.linux-x86_64-3.11/src/ext/uchardet/src
      creating build/temp.linux-x86_64-3.11/src/ext/uchardet/src/LangModels
      gcc -Wsign-compare -DDYNAMIC_ANNOTATIONS_ENABLED=1 -DNDEBUG -O2 -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fstack-protector-strong -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -D_GNU_SOURCE -fPIC -fwrapv -O2 -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fstack-protector-strong -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -D_GNU_SOURCE -fPIC -fwrapv -O2 -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fstack-protector-strong -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -D_GNU_SOURCE -fPIC -fwrapv -fPIC -Isrc/ext/uchardet/src -I/home/mattiasrubenson/PycharmProjects/tldr/venv/include -I/usr/include/python3.11 -c src/cchardet/_cchardet.cpp -o build/temp.linux-x86_64-3.11/src/cchardet/_cchardet.o
      src/cchardet/_cchardet.cpp:196:12: fatal error: longintrepr.h: No such file or directory
        196 |   #include "longintrepr.h"
            |            ^~~~~~~~~~~~~~~
      compilation terminated.
      error: command '/usr/bin/gcc' failed with exit code 1
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: legacy-install-failure

× Encountered error while trying to install package.
╰─> cchardet

note: This is an issue with the package mentioned above, not pip.
hint: See above for output from the failure.

To Reproduce Start a new venv with python3.11 as interpreter. Then run pip install news-please.

Expected behavior Installation of newsplease.

Versions (please complete the following information):

Intent (optional; we'll use this info to prioritize upcoming tasks to work on)

fhamborg commented 1 year ago

Looks like py 3.11 is not supported by cchardet yet, I guess you can watch this issue to keep track of when it will be supported: https://github.com/PyYoshi/cChardet/pull/78

chaarlottte commented 1 year ago

Run pip install cython, then re-run the installation for newsplease.

mattiasrubenson commented 1 year ago

Thank you @chaarlottte, that solved my issues. :)