SEO python scraper to extract data from major searchengine result pages. Extract data like url, title, snippet, richsnippet and the type from searchresults for given keywords. Detect Ads or make automated screenshots. You can also fetch text content of urls provided in searchresults or by your own. It's usefull for SEO and business related research tasks.
I cannot install the Serpscrap package.
My Python version 3.11.7
my present version of lxml is (4.9.3).
(base) C:\Users\Win 10>pip install SerpScrap
Collecting SerpScrap
Using cached SerpScrap-0.13.0-py3-none-any.whl.metadata (6.1 kB)
Collecting PySocks==1.7.0 (from SerpScrap)
Using cached PySocks-1.7.0-py3-none-any.whl.metadata (13 kB)
Collecting chardet==3.0.4 (from SerpScrap)
Using cached chardet-3.0.4-py2.py3-none-any.whl.metadata (3.2 kB)
Collecting beautifulsoup4==4.8.0 (from SerpScrap)
Using cached beautifulsoup4-4.8.0-py3-none-any.whl.metadata (3.0 kB)
Collecting html2text==2019.8.11 (from SerpScrap)
Using cached html2text-2019.8.11-py2.py3-none-any.whl.metadata (4.4 kB)
Collecting lxml==4.3.2 (from SerpScrap)
Using cached lxml-4.3.2.tar.gz (4.4 MB)
Preparing metadata (setup.py) ... done
Collecting sqlalchemy==1.3.7 (from SerpScrap)
Using cached SQLAlchemy-1.3.7-cp311-cp311-win_amd64.whl
Collecting selenium==3.141.0 (from SerpScrap)
Using cached selenium-3.141.0-py2.py3-none-any.whl.metadata (6.6 kB)
Collecting cssselect==1.1.0 (from SerpScrap)
Using cached cssselect-1.1.0-py2.py3-none-any.whl.metadata (2.3 kB)
Requirement already satisfied: soupsieve>=1.2 in c:\users\win 10\anaconda3\lib\site-packages (from beautifulsoup4==4.8.0->SerpScrap) (2.5)
Requirement already satisfied: urllib3 in c:\users\win 10\anaconda3\lib\site-packages (from selenium==3.141.0->SerpScrap) (2.0.7)
Using cached SerpScrap-0.13.0-py3-none-any.whl (45 kB)
Using cached beautifulsoup4-4.8.0-py3-none-any.whl (97 kB)
Using cached chardet-3.0.4-py2.py3-none-any.whl (133 kB)
Using cached cssselect-1.1.0-py2.py3-none-any.whl (16 kB)
Using cached html2text-2019.8.11-py2.py3-none-any.whl (31 kB)
Using cached PySocks-1.7.0-py3-none-any.whl (16 kB)
Using cached selenium-3.141.0-py2.py3-none-any.whl (904 kB)
Building wheels for collected packages: lxml
Building wheel for lxml (setup.py) ... error
error: subprocess-exited-with-error
× python setup.py bdist_wheel did not run successfully.
│ exit code: 1
╰─> [79 lines of output]
Building lxml version 4.3.2.
C:\Users\Win 10\AppData\Local\Temp\pip-install-4nhtru1a\lxml_8d3930865495499e9de1652221681180\setup.py:61: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
import pkg_resources
Building without Cython.
ERROR: b"'xslt-config' is not recognized as an internal or external command,\r\noperable program or batch file.\r\n"
make sure the development packages of libxml2 and libxslt are installed
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for lxml
Running setup.py clean for lxml
Failed to build lxml
ERROR: Could not build wheels for lxml, which is required to install pyproject.toml-based projects
I cannot install the Serpscrap package. My Python version 3.11.7 my present version of lxml is (4.9.3).
(base) C:\Users\Win 10>pip install SerpScrap Collecting SerpScrap Using cached SerpScrap-0.13.0-py3-none-any.whl.metadata (6.1 kB) Collecting PySocks==1.7.0 (from SerpScrap) Using cached PySocks-1.7.0-py3-none-any.whl.metadata (13 kB) Collecting chardet==3.0.4 (from SerpScrap) Using cached chardet-3.0.4-py2.py3-none-any.whl.metadata (3.2 kB) Collecting beautifulsoup4==4.8.0 (from SerpScrap) Using cached beautifulsoup4-4.8.0-py3-none-any.whl.metadata (3.0 kB) Collecting html2text==2019.8.11 (from SerpScrap) Using cached html2text-2019.8.11-py2.py3-none-any.whl.metadata (4.4 kB) Collecting lxml==4.3.2 (from SerpScrap) Using cached lxml-4.3.2.tar.gz (4.4 MB) Preparing metadata (setup.py) ... done Collecting sqlalchemy==1.3.7 (from SerpScrap) Using cached SQLAlchemy-1.3.7-cp311-cp311-win_amd64.whl Collecting selenium==3.141.0 (from SerpScrap) Using cached selenium-3.141.0-py2.py3-none-any.whl.metadata (6.6 kB) Collecting cssselect==1.1.0 (from SerpScrap) Using cached cssselect-1.1.0-py2.py3-none-any.whl.metadata (2.3 kB) Requirement already satisfied: soupsieve>=1.2 in c:\users\win 10\anaconda3\lib\site-packages (from beautifulsoup4==4.8.0->SerpScrap) (2.5) Requirement already satisfied: urllib3 in c:\users\win 10\anaconda3\lib\site-packages (from selenium==3.141.0->SerpScrap) (2.0.7) Using cached SerpScrap-0.13.0-py3-none-any.whl (45 kB) Using cached beautifulsoup4-4.8.0-py3-none-any.whl (97 kB) Using cached chardet-3.0.4-py2.py3-none-any.whl (133 kB) Using cached cssselect-1.1.0-py2.py3-none-any.whl (16 kB) Using cached html2text-2019.8.11-py2.py3-none-any.whl (31 kB) Using cached PySocks-1.7.0-py3-none-any.whl (16 kB) Using cached selenium-3.141.0-py2.py3-none-any.whl (904 kB) Building wheels for collected packages: lxml Building wheel for lxml (setup.py) ... error error: subprocess-exited-with-error
× python setup.py bdist_wheel did not run successfully. │ exit code: 1 ╰─> [79 lines of output] Building lxml version 4.3.2. C:\Users\Win 10\AppData\Local\Temp\pip-install-4nhtru1a\lxml_8d3930865495499e9de1652221681180\setup.py:61: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html import pkg_resources Building without Cython. ERROR: b"'xslt-config' is not recognized as an internal or external command,\r\noperable program or batch file.\r\n" make sure the development packages of libxml2 and libxslt are installed
note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed building wheel for lxml Running setup.py clean for lxml Failed to build lxml ERROR: Could not build wheels for lxml, which is required to install pyproject.toml-based projects