cnumr / ecoindex_cli

This tool provides an easy way to analyze websites with Ecoindex from your local computer. You have the ability to make the analysis on multiple pages with multiple screen resolution. You can also make a recursive analysis from a given website.
Other
47 stars 2 forks source link

[Bug]: ModuleNotFoundError: No module named 'undetected_chromedriver.v2' #226

Closed edouard-lopez closed 1 year ago

edouard-lopez commented 1 year ago

What happened?

Can't run the CLI due to unfound undetected_chromedriver dependencies

❯ poetry run ecoindex-cli
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/home/edouard/projects/explorations/ecoindex_cli/ecoindex_cli/cli/app.py", line 22, in <module>
    from ecoindex_cli.cli.helper import run_page_analysis
  File "/home/edouard/projects/explorations/ecoindex_cli/ecoindex_cli/cli/helper.py", line 4, in <module>
    from ecoindex_scraper.scrap import EcoindexScraper
  File "/home/edouard/.cache/pypoetry/virtualenvs/ecoindex-cli-f6zuBlXq-py3.10/lib/python3.10/site-packages/ecoindex_scraper/__init__.py", line 1, in <module>
    from .scrap import EcoindexScraper
  File "/home/edouard/.cache/pypoetry/virtualenvs/ecoindex-cli-f6zuBlXq-py3.10/lib/python3.10/site-packages/ecoindex_scraper/scrap.py", line 8, in <module>
    import undetected_chromedriver.v2 as uc
ModuleNotFoundError: No module named 'undetected_chromedriver.v2'

Version

Above 3.6

What OS do you use?

Linux

urls

No response

Relevant log output

No response

Code of Conduct

edouard-lopez commented 1 year ago

The package isn't present in project dependencies in the pyproject.toml. You need to install, BUT since version 3.4.0 the v2 has been removed:

3.4.0

Big update! be careful as it -potentially- could break your code. … cleanup removed compat,v2 files and tests folder

source: https://github.com/ultrafunkamsterdam/undetected-chromedriver/blob/3a52c8cbdd1c0aada3e275ceae20514e072b52c2/README.md#340

Workaround

You can install 3.2.0

❯ poetry add undetected_chromedriver@3.2.0
vvatelot commented 1 year ago

Hello @edouard-lopez thanks for your contribution. undetected-chromedriver is a dependency of ecoindex_scrap_python package.

In the poetry.lock file it is frozen to 3.2.1 version.

Did you run a poetry update ? That could explain the upgrade to the 3.4 version. I am waiting for the undetected-chromedriver to fix a bug on the 3.4 version to upgrade

edouard-lopez commented 1 year ago

It's pin in the scraper, but not the CLI:

https://github.com/cnumr/ecoindex_cli/blob/f213a95113fa631b6e8fbac3d680f8081c638c61/poetry.lock#L216

vvatelot commented 1 year ago

undetected-chromedriver

No, this means that ecoindex_scraper requires version ^3.1.6 (this is consistent with ecoindex_scrap configuration) but the version that is really installed is the 3.1.7 (confirmed here)

Can you try to make (with a fresh install)

poetry install && poetry show

and copy paste the result here ?

edouard-lopez commented 1 year ago

From the git clone I already have, I got

❯ poetry install && poetry show
Updating dependencies                                                                                                                                                                                              
Resolving dependencies... (12.1s)                                                                        

Writing lock file                                                                                                                                                                                                  

Package operations: 0 installs, 1 update, 0 removals                                                     

  • Updating undetected-chromedriver (3.4.4 -> 3.4.5)                           

Installing the current project: ecoindex-cli (v2.15.2)                                           
async-generator         1.10      Async generators and context managers for Python 3.5+    
attrs                   22.2.0    Classes Without Boilerplate                     
automat                 22.10.0   Self-service finite-state machines for the programmer on the go.                                                                                                                 
black                   22.12.0   The uncompromising code formatter.                                                                                                                                               
certifi                 2022.12.7 Python package for providing Mozilla's CA Bundle.
cffi                    1.15.1    Foreign Function Interface for Python calling C code.
charset-normalizer      3.0.1     The Real First Universal Charset Detector. Open, modern and actively maintained alternative to Chardet.
click                   8.1.3     Composable command line interface toolkit        
click-spinner           0.1.10    Spinner for Click                                                      
constantly              15.1.0    Symbolic constants in Python                                                                                                                                                     
contourpy               1.0.7     Python library for calculating contours of 2D quadrilateral grids
coverage                7.1.0     Code coverage measurement for Python
cryptography            39.0.1    cryptography is a package which provides cryptographic recipes and primitives to Python developers.
cssselect               1.2.0     cssselect parses CSS3 Selectors and translates them to XPath 1.0
cycler                  0.11.0    Composable style cycles                                    
ecoindex                5.4.1     Ecoindex module provides a simple way to measure the Ecoindex score based on the 3 parameters: The DOM elements of the page, the size of the page and the number of external...
ecoindex-scraper        2.13.1    Ecoindex_scraper module provides a way to scrape data from given website while simulating a real web browser
exceptiongroup          1.1.0     Backport of PEP 654 (exception groups)                  
filelock                3.9.0     A platform independent file lock.                        
fonttools               4.38.0    Tools to manipulate font files                                                                                                                                                   
h11                     0.14.0    A pure-Python, bring-your-own-I/O implementation of HTTP/1.1
hyperlink               21.0.0    A featureful, immutable, and correct URL for Python.   
idna                    3.4       Internationalized Domain Names in Applications (IDNA)
incremental             22.10.0   "A small library that versions your Python projects." 
iniconfig               2.0.0     brain-dead simple config-ini parsing                              
itemadapter             0.7.0     Common interface for data container classes           
itemloaders             1.0.6     Base library for scrapy's ItemLoader                                                                                                                                             
jinja2                  3.1.2     A very fast and expressive template engine.                           
jmespath                1.0.1     JSON Matching Expressions       
kiwisolver              1.4.4     A fast implementation of the Cassowary constraint solver     
lxml                    4.9.2     Powerful and Pythonic XML processing library combining libxml2/libxslt with the ElementTree API.
markupsafe              2.1.2     Safely add untrusted strings to HTML/XML markup.
matplotlib              3.6.3     Python plotting package
mypy-extensions         1.0.0     Type system extensions for programs checked with the mypy type checker. 
numpy                   1.24.2    Fundamental package for array computing in Python
outcome                 1.2.0     Capture the outcome of Python function calls.
packaging               23.0      Core utilities for Python packages
pandas                  1.5.3     Powerful data structures for data analysis, time series, and statistics
parsel                  1.7.0     Parsel is a library to extract data from HTML and XML using XPath and CSS selectors
pathspec                0.11.0    Utility library for gitignore style pattern matching of file paths.
pillow                  9.4.0     Python Imaging Library (Fork)
platformdirs            3.0.0     A small Python package for determining appropriate platform-specific dirs, e.g. a "user data dir".
pluggy                  1.0.0     plugin and hook calling mechanisms for python
protego                 0.2.1     Pure-Python robots.txt parser with support for modern conventions
pyasn1                  0.4.8     ASN.1 types and codecs
pyasn1-modules          0.2.8     A collection of ASN.1-based protocols modules.
pycparser               2.21      C parser in Python
pydantic                1.10.4    Data validation and settings management using python type hints
pydispatcher            2.0.6     Multi-Producer Multi-Consumer Observer Pattern for Python
pyopenssl               23.0.0    Python wrapper module around the OpenSSL library
pyparsing               3.0.9     pyparsing module - Classes and methods to define and execute parsing grammars
pysocks                 1.7.1     A Python SOCKS client module. See https://github.com/Anorov/PySocks for more information.
pytest                  7.2.1     pytest: simple powerful testing with Python
pytest-cov              4.0.0     Pytest plugin for measuring coverage.
python-dateutil         2.8.2     Extensions to the standard Python datetime module
pytz                    2022.7.1  World timezone definitions, modern and historical
pyyaml                  6.0       YAML parser and emitter for Python
queuelib                1.6.2     Collection of persistent (disk-based) and non-persistent (memory-based) queues
requests                2.28.2    Python HTTP for Humans.
requests-file           1.5.1     File transport adapter for Requests
scrapy                  2.8.0     A high-level Web Crawling and Web Scraping framework
selenium                4.8.0     
service-identity        21.1.0    Service identity verification for pyOpenSSL & cryptography.
setuptools-scm          7.1.0     the blessed package to manage your versions by scm tags
six                     1.16.0    Python 2 and 3 compatibility utilities
sniffio                 1.3.0     Sniff out which async library your code is running under
sortedcontainers        2.4.0     Sorted Containers -- Sorted List, Sorted Dict, Sorted Set
tldextract              3.4.0     Accurately separates a URL's subdomain, domain, and public suffix, using the Public Suffix List (PSL). By default, this includes the public ICANN TLDs and their exceptions....
tomli                   2.0.1     A lil' TOML parser
trio                    0.22.0    A friendly Python library for async concurrency and I/O
trio-websocket          0.9.2     WebSocket library for Trio
twisted                 22.10.0   An asynchronous networking framework written in Python
typer                   0.7.0     Typer, build great CLIs. Easy to code. Based on Python type hints.
typing-extensions       4.4.0     Backported and Experimental Type Hints for Python 3.7+
undetected-chromedriver 3.4.5     ('Selenium.webdriver.Chrome replacement with compatiblity for Brave, and other Chromium based browsers.', 'Not triggered by CloudFlare/Imperva/hCaptcha and such.', 'NOTE: r...
urllib3                 1.26.14   HTTP library with thread-safe connection pooling, file post, and more.
w3lib                   2.1.1     Library of web-related functions
websockets              10.4      An implementation of the WebSocket Protocol (RFC 6455 & 7692)
wsproto                 1.2.0     WebSockets state-machine based protocol implementation
zope.interface          5.5.2     Interfaces for Python

From a new git clone I have different issue:

❯ git clone git@github.com:cnumr/ecoindex_cli.git ecoindex_cli2                              explorations                                                                                                          
Cloning into 'ecoindex_cli2'...                                                                                                                                                                                    
remote: Enumerating objects: 835, done.                                                                                                                                                                            
remote: Counting objects: 100% (515/515), done.                                                          
remote: Compressing objects: 100% (242/242), done.                                                                                                                                                                 
remote: Total 835 (delta 354), reused 388 (delta 269), pack-reused 320                                                                                                                                             
Receiving objects: 100% (835/835), 1.00 MiB | 346.00 KiB/s, done.                                                                                                                                                  
Resolving deltas: 100% (516/516), done.                                                                                                                                                                            
~/projects/explorations                                                                                  
❯ cd ecoindex_cli2/                                                                          explorations                                                                                                          
~/projects/explorations/ecoindex_cli2 main                                                               
❯                                                                                           ecoindex_cli2                                                                                                          
~/projects/explorations/ecoindex_cli2 main                                                               
❯ poetry install && poetry show > show.txt                                                  ecoindex_cli2                                                                                                          
Creating virtualenv ecoindex-cli-jdRZEe1_-py3.10 in /home/edouard/.cache/pypoetry/virtualenvs                                                                                                                      
Installing dependencies from lock file                                                                                                                                                                             

  SolverProblemError                                

  Because no versions of scrapy match >=2.5.0,<2.7.1 || >2.7.1,<3.0.0                                                                                                                                              
   and scrapy (2.7.1) depends on zope.interface (>=5.1.0), scrapy (>=2.5.0,<3.0.0) requires zope.interface (>=5.1.0).
  So, because no versions of zope.interface match >=5.1.0                                                                                                                                                          
   and ecoindex-cli depends on Scrapy (^2.5.0), version solving failed.                                                                                                                                            

  at ~/.local/share/pypoetry/venv/lib/python3.10/site-packages/poetry/puzzle/solver.py:241 in _solve                                                                                                               
      237│             packages = result.packages                                                        
      238│         except OverrideNeeded as e:                                                           
      239│             return self.solve_in_compatibility_mode(e.overrides, use_latest=use_latest)                                                                                                                 
      240│         except SolveFailure as e:                                                             
    → 241│             raise SolverProblemError(e)                                                       
      242│                                          
      243│         results = dict(                                                                       
      244│             depth_first_search(                                                               
      245│                 PackageNode(self._package, packages), aggregate_package_nodes                                                                                                                           
~/projects/explorations/ecoindex_cli2 main                                                               

Oups wasn't in a virtual env, but got same issue

❯ poetry shell                                                                              ecoindex_cli2                                                                                                          
Spawning shell within /home/edouard/.cache/pypoetry/virtualenvs/ecoindex-cli-jdRZEe1_-py3.10                                                                                                                       
source /home/edouard/.cache/pypoetry/virtualenvs/ecoindex-cli-jdRZEe1_-py3.10/bin/activate.fish                                                                                                                    
greeting hyouman!                                   
~/projects/explorations/ecoindex_cli2 main                                                               
❯ source /home/edouard/.cache/pypoetry/virtualenvs/ecoindex-cli-jdRZEe1_-py3.10/bin/activate.fish                                                                                                                  
~/projects/explorations/ecoindex_cli2 main                                                               
ecoindex-cli-jdRZEe1_-py3.10 ❯ poetry install && poetry show > show.txt                     ecoindex_cli2                                                                                                          
Installing dependencies from lock file                                                                   

  SolverProblemError                                

  Because no versions of scrapy match >=2.5.0,<2.7.1 || >2.7.1,<3.0.0                                                                                                                                              
   and scrapy (2.7.1) depends on zope.interface (>=5.1.0), scrapy (>=2.5.0,<3.0.0) requires zope.interface (>=5.1.0).
  So, because no versions of zope.interface match >=5.1.0                                                                                                                                                          
   and ecoindex-cli depends on Scrapy (^2.5.0), version solving failed.                                                                                                                                            

  at ~/.local/share/pypoetry/venv/lib/python3.10/site-packages/poetry/puzzle/solver.py:241 in _solve                                                                                                               
      237│             packages = result.packages                                                        
      238│         except OverrideNeeded as e:                                                           
      239│             return self.solve_in_compatibility_mode(e.overrides, use_latest=use_latest)                                                                                                                 
      240│         except SolveFailure as e:                                                             
    → 241│             raise SolverProblemError(e)                                                       
      242│                                          
      243│         results = dict(                                                                       
      244│             depth_first_search(                                                               
      245│                 PackageNode(self._package, packages), aggregate_package_nodes         

Python version is:

ecoindex-cli-jdRZEe1_-py3.10 ❯ python --version                                                                                                                                                      ecoindex_cli2 
Python 3.10.7
vvatelot commented 1 year ago

OK, I reproduce the issue. Nevermind, I am focusing on uprading https://github.com/cnumr/ecoindex_scrap_python/ to the undetected_chromedriver 3.4.5 version.

This upgrade (3.2 -> 3.4) should have been a major release as this is a breaking change. I do not agree with their version and release management but I have to deal with it :disappointed:

Now, the 3.4.5 introduces a new behavior, it first open the chrome://welcome page and I try to bypass it...

vvatelot commented 1 year ago

https://github.com/cnumr/ecoindex_scrap_python/pull/80

vvatelot commented 1 year ago

closed by #227

vvatelot commented 1 year ago

Hello @edouard-lopez this issue should be closed by the latest version in which I upgraded undetected chromedriver to 3.4.5 (and froze it !)

https://pypi.org/project/ecoindex-cli/2.16.1/

Can you confirm this is OK ?

edouard-lopez commented 1 year ago

The undetected-chromedriver issue is fixed!

Re-install

Clean up

❯ pip uninstall undetected_chromedriver
❯ pip uninstall ecoindex-cli

Install

❯ pip install --user -U ecoindex-cli

Checking versions

ecoindex

❯ pip freeze | grep ecoindex
ecoindex==5.4.1
ecoindex-cli==2.16.1
ecoindex-scraper==2.14.0
~/projects/explorations/ecoindex_cli main

undetected-chromedriver

❯ pip freeze | grep und
command-not-found==0.3
undetected-chromedriver==3.4.5

Usage

❯ ecoindex-cli analyze --url http://manomano.fr/                                                         
📁️ Urls recorded in file `/tmp/ecoindex-cli/input/manomano.fr.csv`                                  
There are 1 url(s), do you want to process? [Y/n]:                                                       
1 urls for 1 window size with 8 maximum workers                                                          
Processing  [####################################]  1/1  100%                                       
Errors found: please look at /tmp/ecoindex-cli/logs/manomano.fr.log)                                

… 
IndexError: list index out of range
edouard-lopez commented 1 year ago

Fixed the IndexError: list index out of range issue by passing --chrome-version as mentioned in #215

❯ ecoindex-cli analyze \
    --url http://manomano.fr/ \
    --chrome-version (google-chrome --version  | grep --only -P '(?<=\\s)\\d{3}')
vvatelot commented 1 year ago

Thanks for your feedback. I will try to improve the logger...