ArchiveTeam / grab-site

The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns
Other
1.4k stars 135 forks source link

install error in macOS Catalina #217

Closed LeeBinder closed 2 years ago

LeeBinder commented 2 years ago

Hi @ivan . Following the steps at https://github.com/ArchiveTeam/grab-site#install-on-macos, I get errors in Terminal, and consequently

$ gs-server
-bash: gs-server: command not found.

My default shell is still bash; Python is latest 3.9.12; pip is latest 22.0.4 from Python 3.8.

The incompatible command is

PKG_CONFIG_PATH="/usr/local/opt/libxml2/lib/pkgconfig" ~/gs-venv/bin/pip install --no-binary lxml --upgrade git+https://github.com/ArchiveTeam/grab-site

Replacing pip install with pip3 install does not make a difference. Here's the dump from Terminal:

The default interactive shell is now zsh.
To update your account to use zsh, please run `chsh -s /bin/zsh`.
For more details, please visit https://support.apple.com/kb/HT208050.
$ ~/install.command

Already up-to-date.
Warning: python@3.7 3.7.13 is already installed and up-to-date.
To reinstall 3.7.13, run:
  brew reinstall python@3.7
Warning: libxslt 1.1.35_1 is already installed and up-to-date.
To reinstall 1.1.35_1, run:
  brew reinstall libxslt
Warning: re2 20220401 is already installed and up-to-date.
To reinstall 20220401, run:
  brew reinstall re2
Warning: pkg-config 0.29.2_3 is already installed and up-to-date.
To reinstall 0.29.2_3, run:
  brew reinstall pkg-config
Collecting git+https://github.com/ArchiveTeam/grab-site
  Cloning https://github.com/ArchiveTeam/grab-site to /private/var/folders/tb/t_5w72z5551f3lhkw6gq9l780000gn/T/pip-req-build-4c3soxhh
  Running command git clone --filter=blob:none --quiet https://github.com/ArchiveTeam/grab-site /private/var/folders/tb/t_5w72z5551f3lhkw6gq9l780000gn/T/pip-req-build-4c3soxhh
  Resolved https://github.com/ArchiveTeam/grab-site to commit 14c3bbdf7156a923c3b5baeef60b4e9fa3ef363c
  Preparing metadata (setup.py) ... done
Collecting wpull@ https://github.com/ArchiveTeam/ludios_wpull/tarball/master#egg=wpull-3.0.9
  Using cached https://github.com/ArchiveTeam/ludios_wpull/tarball/master
  Preparing metadata (setup.py) ... done
Collecting click>=6.3
  Using cached click-8.1.2-py3-none-any.whl (96 kB)
Collecting manhole>=1.0.0
  Using cached manhole-1.8.0-py2.py3-none-any.whl (16 kB)
Requirement already satisfied: lmdb>=0.89 in ./gs-venv/lib/python3.9/site-packages (from grab-site==2.2.2) (1.3.0)
Collecting autobahn>=0.12.1
  Using cached autobahn-22.3.2.tar.gz (376 kB)
  Preparing metadata (setup.py) ... done
Collecting fb-re2>=1.0.6
  Using cached fb-re2-1.0.7.tar.gz (9.4 kB)
  Preparing metadata (setup.py) ... done
Collecting websockets>=6.0
  Using cached websockets-10.2-cp39-cp39-macosx_10_9_x86_64.whl (96 kB)
Collecting cchardet>=1.0.0
  Using cached cchardet-2.1.7-cp39-cp39-macosx_10_9_x86_64.whl (124 kB)
Collecting txaio>=21.2.1
  Using cached txaio-22.2.1-py2.py3-none-any.whl (30 kB)
Collecting cryptography>=3.4.6
  Using cached cryptography-36.0.2-cp36-abi3-macosx_10_10_x86_64.whl (2.5 MB)
Collecting hyperlink>=21.0.0
  Using cached hyperlink-21.0.0-py2.py3-none-any.whl (74 kB)
Requirement already satisfied: setuptools in ./gs-venv/lib/python3.9/site-packages (from autobahn>=0.12.1->grab-site==2.2.2) (60.10.0)
Collecting chardet
  Using cached chardet-4.0.0-py2.py3-none-any.whl (178 kB)
Collecting dnspython
  Using cached dnspython-2.2.1-py3-none-any.whl (269 kB)
Collecting html5-parser
  Using cached html5-parser-0.4.10.tar.gz (272 kB)
  Preparing metadata (setup.py) ... done
Collecting lxml
  Using cached lxml-4.8.0.tar.gz (3.2 MB)
  Preparing metadata (setup.py) ... done
Requirement already satisfied: namedlist in ./gs-venv/lib/python3.9/site-packages (from wpull@ https://github.com/ArchiveTeam/ludios_wpull/tarball/master#egg=wpull-3.0.9->grab-site==2.2.2) (1.8)
Collecting sqlalchemy==1.3.24
  Using cached SQLAlchemy-1.3.24-cp39-cp39-macosx_10_14_x86_64.whl (1.2 MB)
Requirement already satisfied: tornado==4.5.3 in ./gs-venv/lib/python3.9/site-packages (from wpull@ https://github.com/ArchiveTeam/ludios_wpull/tarball/master#egg=wpull-3.0.9->grab-site==2.2.2) (4.5.3)
Requirement already satisfied: yapsy in ./gs-venv/lib/python3.9/site-packages (from wpull@ https://github.com/ArchiveTeam/ludios_wpull/tarball/master#egg=wpull-3.0.9->grab-site==2.2.2) (1.12.2)
Collecting cffi>=1.12
  Using cached cffi-1.15.0-cp39-cp39-macosx_10_9_x86_64.whl (178 kB)
Collecting idna>=2.5
  Using cached idna-3.3-py3-none-any.whl (61 kB)
Collecting pycparser
  Using cached pycparser-2.21-py2.py3-none-any.whl (118 kB)
Using legacy 'setup.py install' for grab-site, since package 'wheel' is not installed.
Using legacy 'setup.py install' for autobahn, since package 'wheel' is not installed.
Using legacy 'setup.py install' for fb-re2, since package 'wheel' is not installed.
Using legacy 'setup.py install' for wpull, since package 'wheel' is not installed.
Using legacy 'setup.py install' for html5-parser, since package 'wheel' is not installed.
Skipping wheel build for lxml, due to binaries being disabled for it.
Installing collected packages: fb-re2, cchardet, websockets, txaio, sqlalchemy, pycparser, manhole, lxml, idna, dnspython, click, chardet, hyperlink, html5-parser, cffi, wpull, cryptography, autobahn, grab-site
  Running setup.py install for fb-re2 ... error
  error: subprocess-exited-with-error

  × Running setup.py install for fb-re2 did not run successfully.
  │ exit code: 1
  ╰─> [17 lines of output]
      running install
      /Users/lee/gs-venv/lib/python3.9/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
        warnings.warn(
      running build
      running build_py
      creating build
      creating build/lib.macosx-10.15-x86_64-3.9
      copying re2.py -> build/lib.macosx-10.15-x86_64-3.9
      running build_ext
      building '_re2' extension
      creating build/temp.macosx-10.15-x86_64-3.9
      clang -Wno-unused-result -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX10.15.sdk -I/Users/lee/gs-venv/include -I/usr/local/opt/python@3.9/Frameworks/Python.framework/Versions/3.9/include/python3.9 -c _re2.cc -o build/temp.macosx-10.15-x86_64-3.9/_re2.o -std=c++11
      _re2.cc:37:10: fatal error: 're2/re2.h' file not found
      #include <re2/re2.h>
               ^~~~~~~~~~~
      1 error generated.
      error: command '/usr/bin/clang' failed with exit code 1
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: legacy-install-failure

× Encountered error while trying to install package.
╰─> fb-re2

note: This is an issue with the package mentioned above, not pip.
hint: See above for output from the failure.

Do you have any idea?

TheTechRobo commented 2 years ago

#include <re2/re2.h>

https://stackoverflow.com/questions/30956128/installation-of-re2-module-in-python-failed

LeeBinder commented 2 years ago

Ok. Since Python 3.9 is not supported anyway, would https://github.com/ArchiveTeam/grab-site#using-nix be an alternative in macOS, because it does not use Python at all?

ivan commented 2 years ago

Sorry about the trouble and thanks for the report. I am going to update the homebrew instructions to use Python 3.8, and also fix them for M1. Hopefully that will get you a working grab-site install.

LeeBinder commented 2 years ago

Hi Ivan. Ah OK, so downgrading to Python 3.8.x would be sufficient - 3.7.x isn't required?

And what about Nix?

ivan commented 2 years ago

Right, Python 3.8 should work with grab-site.

I have updated https://github.com/ArchiveTeam/grab-site#install-on-macos, please let me know if that works.

I have also updated https://github.com/ArchiveTeam/grab-site#using-nix if you would like to use Nix instead.

I use Nix, but mostly on Linux instead of macOS. I guess most macOS users prefer Homebrew, so I keep both installation methods. Nix should work fine but does create an APFS volume for /nix and also add build users to the system.

(Nix install currently broken because of Yapsy, sorry, I'll look into it.)

LeeBinder commented 2 years ago

OK. But even with the updated python 3.8 install script I still get the #include <re2/re2.h> error during install. In addition to brew install python@3.8 libxslt re2 pkg-config, I even ran

brew remove python@3.7
brew unlink python@3.9
brew link --force python@3.8
python3 --version

to make sure python 3.8 is active.

I also followed TheTechRobo's link to the stackoverflow re2 Linux topic all the way to the re2 python wrapper source code. re2 is installed:

Bildschirmfoto 2022-04-10 um 21 10 11

brew install libre2-dev doesn't work/ exist for macOS. brew reinstall re2 didn't help, either.

I am unsure what would be the next thing to do.

ivan commented 2 years ago
  1. Can you confirm you are using an Intel Mac?
  2. Are you using macOS 10.15 or something newer?
  3. Can you please try with this (I changed PKG_CONFIG_PATH), and if it still does not work, paste the full terminal session in here?
brew update
brew install python@3.8 libxslt re2 pkg-config
/usr/local/opt/python@3.8/bin/python3 -m venv ~/gs-venv
PKG_CONFIG_PATH="/usr/local/opt/re2/lib/pkgconfig:/usr/local/opt/libxml2/lib/pkgconfig" ~/gs-venv/bin/pip install --no-binary lxml --upgrade git+https://github.com/ArchiveTeam/grab-site
LeeBinder commented 2 years ago

Intel Mac with macOS 10.15.7 with latest security update.

The first three lines run through fine as before, but still the same error even with the changed PKG_CONFIG_PATH:

The default interactive shell is now zsh.
To update your account to use zsh, please run `chsh -s /bin/zsh`.
For more details, please visit https://support.apple.com/kb/HT208050.
$ PKG_CONFIG_PATH="/usr/local/opt/re2/lib/pkgconfig:/usr/local/opt/libxml2/lib/pkgconfig" ~/gs-venv/bin/pip install --no-binary lxml --upgrade git+https://github.com/ArchiveTeam/grab-site
Collecting git+https://github.com/ArchiveTeam/grab-site
  Cloning https://github.com/ArchiveTeam/grab-site to /private/var/folders/tb/t_5w72z5551f3lhkw6gq9l780000gn/T/pip-req-build-i2tb1j71
  Running command git clone --filter=blob:none --quiet https://github.com/ArchiveTeam/grab-site /private/var/folders/tb/t_5w72z5551f3lhkw6gq9l780000gn/T/pip-req-build-i2tb1j71
  Resolved https://github.com/ArchiveTeam/grab-site to commit 20e5fef01d30ae529da1b99ef5c91e78be01e9ec
  Preparing metadata (setup.py) ... done
Collecting wpull@ https://github.com/ArchiveTeam/ludios_wpull/tarball/master#egg=wpull-3.0.9
  Using cached https://github.com/ArchiveTeam/ludios_wpull/tarball/master
  Preparing metadata (setup.py) ... done
Collecting click>=6.3
  Using cached click-8.1.2-py3-none-any.whl (96 kB)
Collecting manhole>=1.0.0
  Using cached manhole-1.8.0-py2.py3-none-any.whl (16 kB)
Requirement already satisfied: lmdb>=0.89 in ./gs-venv/lib/python3.9/site-packages (from grab-site==2.2.2) (1.3.0)
Collecting autobahn>=0.12.1
  Using cached autobahn-22.3.2.tar.gz (376 kB)
  Preparing metadata (setup.py) ... done
Collecting fb-re2>=1.0.6
  Using cached fb-re2-1.0.7.tar.gz (9.4 kB)
  Preparing metadata (setup.py) ... done
Collecting websockets>=6.0
  Using cached websockets-10.2-cp39-cp39-macosx_10_9_x86_64.whl (96 kB)
Collecting cchardet>=1.0.0
  Using cached cchardet-2.1.7-cp39-cp39-macosx_10_9_x86_64.whl (124 kB)
Collecting txaio>=21.2.1
  Using cached txaio-22.2.1-py2.py3-none-any.whl (30 kB)
Collecting cryptography>=3.4.6
  Using cached cryptography-36.0.2-cp36-abi3-macosx_10_10_x86_64.whl (2.5 MB)
Collecting hyperlink>=21.0.0
  Using cached hyperlink-21.0.0-py2.py3-none-any.whl (74 kB)
Requirement already satisfied: setuptools in ./gs-venv/lib/python3.9/site-packages (from autobahn>=0.12.1->grab-site==2.2.2) (60.10.0)
Collecting chardet
  Using cached chardet-4.0.0-py2.py3-none-any.whl (178 kB)
Collecting dnspython
  Using cached dnspython-2.2.1-py3-none-any.whl (269 kB)
Collecting html5-parser
  Using cached html5-parser-0.4.10.tar.gz (272 kB)
  Preparing metadata (setup.py) ... done
Collecting lxml
  Using cached lxml-4.8.0.tar.gz (3.2 MB)
  Preparing metadata (setup.py) ... done
Requirement already satisfied: namedlist in ./gs-venv/lib/python3.9/site-packages (from wpull@ https://github.com/ArchiveTeam/ludios_wpull/tarball/master#egg=wpull-3.0.9->grab-site==2.2.2) (1.8)
Collecting sqlalchemy==1.3.24
  Using cached SQLAlchemy-1.3.24-cp39-cp39-macosx_10_14_x86_64.whl (1.2 MB)
Requirement already satisfied: tornado==4.5.3 in ./gs-venv/lib/python3.9/site-packages (from wpull@ https://github.com/ArchiveTeam/ludios_wpull/tarball/master#egg=wpull-3.0.9->grab-site==2.2.2) (4.5.3)
Requirement already satisfied: yapsy in ./gs-venv/lib/python3.9/site-packages (from wpull@ https://github.com/ArchiveTeam/ludios_wpull/tarball/master#egg=wpull-3.0.9->grab-site==2.2.2) (1.12.2)
Collecting cffi>=1.12
  Using cached cffi-1.15.0-cp39-cp39-macosx_10_9_x86_64.whl (178 kB)
Collecting idna>=2.5
  Using cached idna-3.3-py3-none-any.whl (61 kB)
Collecting pycparser
  Using cached pycparser-2.21-py2.py3-none-any.whl (118 kB)
Using legacy 'setup.py install' for grab-site, since package 'wheel' is not installed.
Using legacy 'setup.py install' for autobahn, since package 'wheel' is not installed.
Using legacy 'setup.py install' for fb-re2, since package 'wheel' is not installed.
Using legacy 'setup.py install' for wpull, since package 'wheel' is not installed.
Using legacy 'setup.py install' for html5-parser, since package 'wheel' is not installed.
Skipping wheel build for lxml, due to binaries being disabled for it.
Installing collected packages: fb-re2, cchardet, websockets, txaio, sqlalchemy, pycparser, manhole, lxml, idna, dnspython, click, chardet, hyperlink, html5-parser, cffi, wpull, cryptography, autobahn, grab-site
  Running setup.py install for fb-re2 ... error
  error: subprocess-exited-with-error

  × Running setup.py install for fb-re2 did not run successfully.
  │ exit code: 1
  ╰─> [17 lines of output]
      running install
      /Users/lee/gs-venv/lib/python3.9/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
        warnings.warn(
      running build
      running build_py
      creating build
      creating build/lib.macosx-10.15-x86_64-3.9
      copying re2.py -> build/lib.macosx-10.15-x86_64-3.9
      running build_ext
      building '_re2' extension
      creating build/temp.macosx-10.15-x86_64-3.9
      clang -Wno-unused-result -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX10.15.sdk -I/Users/lee/gs-venv/include -I/usr/local/opt/python@3.9/Frameworks/Python.framework/Versions/3.9/include/python3.9 -c _re2.cc -o build/temp.macosx-10.15-x86_64-3.9/_re2.o -std=c++11
      _re2.cc:37:10: fatal error: 're2/re2.h' file not found
      #include <re2/re2.h>
               ^~~~~~~~~~~
      1 error generated.
      error: command '/usr/bin/clang' failed with exit code 1
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: legacy-install-failure

× Encountered error while trying to install package.
╰─> fb-re2

note: This is an issue with the package mentioned above, not pip.
hint: See above for output from the failure.
ivan commented 2 years ago

It looks like something might be going wrong with the existing gs-venv because it was created with Python 3.9 instead of Python 3.8.

Does the install work if you rm -rf ~/gs-venv first?

(If not, can I get the terminal output again? Thanks)

LeeBinder commented 2 years ago

Yee-Ha - WORKING 👍 🥇 !

Bildschirmfoto 2022-04-10 um 21 58 30

Before I read your post, I already ran sudo port install py38-re2 to install the Python re2 wrapper for Python, explicitly for v.3.8, via MacPorts, which ran through w/o error.

Then ran your altered PKG_CONFIG_PATH command again -> still same error.

Then ran rm -rf ~/gs-venv, then your default PKG_CONFIG_PATH command from your manual (not the altered one) -> bingo, finally no error, and gs-server running.

So now I cannot say if the re2 installation via MacPorts is part of the solution, but at least grab-site is working, and I am happy. I hope you are, too. Thank you for the support!