NikolaiT / GoogleScraper

A Python module to scrape several search engines (like Google, Yandex, Bing, Duckduckgo, ...). Including asynchronous networking support.
https://scrapeulous.com/
Apache License 2.0
2.64k stars 743 forks source link

AttributeError: 'SelScrape' object has no attribute 'webdriver' #78

Closed marcoippolito closed 9 years ago

marcoippolito commented 9 years ago

Running the example in here: https://github.com/NikolaiT/GoogleScraper GoogleScraper -m selenium --keyword "apple" -v2 2015-02-02 16:09:11,907 - GoogleScraper - INFO - 0 cache files found in .scrapecache/ 2015-02-02 16:09:11,907 - GoogleScraper - INFO - 0/1 objects have been read from the cache. 1 remain to get scraped. 2015-02-02 16:09:12,022 - GoogleScraper - INFO - Going to scrape 1 keywords with 1 proxies by using 1 threads. 2015-02-02 16:09:12,024 - GoogleScraper - INFO - [+] SelScrape[localhost][search-type:normal][https://www.google.com/search?] using search engine "google". Num keywords=1, num pages for keyword=[1] 2015-02-02 16:10:13,055 - GoogleScraper - ERROR - Message: unknown error: Chrome failed to start: exited abnormally (Driver info: chromedriver=2.13.307649 (bf55b442bb6b5c923249dd7870d6a107678bfbb6),platform=Linux 3.13.0-32-generic x86_64)

2015-02-02 16:10:13,056 - GoogleScraper - WARNING - [google]SelScrape: Aborting due to no available selenium webdriver. Exception in thread [google]SelScrape: Traceback (most recent call last): File "/usr/lib/python3.4/threading.py", line 920, in _bootstrap_inner self.run() File "/home/marco/crawlscrape/env/lib/python3.4/site-packages/GoogleScraper/selenium_mode.py", line 497, in run self.webdriver.set_window_size(400, 400) AttributeError: 'SelScrape' object has no attribute 'webdriver'

NikolaiT commented 9 years ago

Install chrome driver. See in the readme of GoogleScraper for installation. Cheers

marcoippolito commented 9 years ago

Hi Nikolai, being stubborn, I tried again:

~/crawlscrape$ virtualenv --python python3 env Running virtualenv with interpreter /usr/bin/python3 Using base prefix '/usr' New python executable in env/bin/python3 Not overwriting existing python script env/bin/python (you must use env/bin/python3) Installing setuptools, pip...done. marco@pc:~/crawlscrape$ source env/bin/activate (env)marco@pc:~/crawlscrape$ pip install GoogleScraper Requirement already satisfied (use --upgrade to upgrade): GoogleScraper in ./env/lib/python3.4/site-packages Requirement already satisfied (use --upgrade to upgrade): lxml in ./env/lib/python3.4/site-packages (from GoogleScraper) Requirement already satisfied (use --upgrade to upgrade): selenium in ./env/lib/python3.4/site-packages (from GoogleScraper) Requirement already satisfied (use --upgrade to upgrade): cssselect in ./env/lib/python3.4/site-packages (from GoogleScraper) Requirement already satisfied (use --upgrade to upgrade): requests in ./env/lib/python3.4/site-packages (from GoogleScraper) Requirement already satisfied (use --upgrade to upgrade): PyMySql in ./env/lib/python3.4/site-packages (from GoogleScraper) Requirement already satisfied (use --upgrade to upgrade): sqlalchemy in ./env/lib/python3.4/site-packages (from GoogleScraper) Requirement already satisfied (use --upgrade to upgrade): aiohttp in ./env/lib/python3.4/site-packages (from GoogleScraper) Cleaning up... (env)marco@pc:~/crawlscrape$ time ./google_scraper_example.py Usage: ./google_scraper_example.py [basic|image-search]

real 0m0.709s user 0m0.223s sys 0m0.019s (env)marco@pc:~/crawlscrape$ time ./google_scraper_example.py 'basic' 2015-02-21 13:06:43,953 - GoogleScraper - INFO - Going to scrape 1 keywords with 1 proxies by using 1 threads. 2015-02-21 13:06:43,953 - GoogleScraper - INFO - [+] SelScrape[localhost][search-type:normal][http://yandex.ru/yandsearch?] using search engine "yandex". Num keywords=1, num pages for keyword=1 2015-02-21 13:07:45,055 - GoogleScraper - ERROR - Message: unknown error: Chrome failed to start: exited abnormally (Driver info: chromedriver=2.13.307649 (bf55b442bb6b5c923249dd7870d6a107678bfbb6),platform=Linux 3.13.0-32-generic x86_64)

Exception in thread Thread-2: Traceback (most recent call last): File "/usr/lib/python3.4/threading.py", line 920, in _bootstrap_inner self.run() File "/usr/local/lib/python3.4/dist-packages/GoogleScraper/selenium_mode.py", line 419, in run raise SeleniumMisconfigurationError('Aborting due to no available selenium webdriver.') GoogleScraper.scraping.SeleniumMisconfigurationError: Aborting due to no available selenium webdriver.

So... I followed the indications here:https://github.com/NikolaiT/GoogleScraper during installation it says: "Requirement already satisfied" (second trial) but when executing: "GoogleScraper.scraping.SeleniumMisconfigurationError: Aborting due to no available selenium webdriver."

I also upgraded GoogleScraper: (env)marco@pc:~/crawlscrape$ pip install --upgrade GoogleScraper Downloading/unpacking GoogleScraper from https://pypi.python.org/packages/source/G/GoogleScraper/GoogleScraper-0.1.36.tar.gz#md5=5a48b02dd0b3610d2412c997a71047b5 Downloading GoogleScraper-0.1.36.tar.gz (74kB): 74kB downloaded Running setup.py (path:/home/marco/crawlscrape/env/build/GoogleScraper/setup.py) egg_info for package GoogleScraper

file usage.py (for module usage) not found

Downloading/unpacking lxml from https://pypi.python.org/packages/source/l/lxml/lxml-3.4.2.tar.gz#md5=429e5e771c4be0798923c04cb9739b4e (from GoogleScraper) Downloading lxml-3.4.2.tar.gz (3.5MB): 3.5MB downloaded Running setup.py (path:/home/marco/crawlscrape/env/build/lxml/setup.py) egg_info for package lxml Building lxml version 3.4.2. Building without Cython. Using build configuration of libxslt 1.1.28 /usr/lib/python3.4/distutils/dist.py:260: UserWarning: Unknown distribution option: 'bugtrack_url' warnings.warn(msg)

warning: no previously-included files found matching '*.py'

Requirement already up-to-date: selenium in ./env/lib/python3.4/site-packages (from GoogleScraper) Requirement already up-to-date: cssselect in ./env/lib/python3.4/site-packages (from GoogleScraper) Requirement already up-to-date: requests in ./env/lib/python3.4/site-packages (from GoogleScraper) Requirement already up-to-date: PyMySql in ./env/lib/python3.4/site-packages (from GoogleScraper) Requirement already up-to-date: sqlalchemy in ./env/lib/python3.4/site-packages (from GoogleScraper) Requirement already up-to-date: aiohttp in ./env/lib/python3.4/site-packages (from GoogleScraper) Installing collected packages: GoogleScraper, lxml Found existing installation: GoogleScraper 0.1.26 Uninstalling GoogleScraper: Successfully uninstalled GoogleScraper Running setup.py install for GoogleScraper file usage.py (for module usage) not found file usage.py (for module usage) not found

file usage.py (for module usage) not found
Installing GoogleScraper script to /home/marco/crawlscrape/env/bin
file usage.py (for module usage) not found

Found existing installation: lxml 3.4.1 Uninstalling lxml: Successfully uninstalled lxml Running setup.py install for lxml Building lxml version 3.4.2. Building without Cython. Using build configuration of libxslt 1.1.28 building 'lxml.etree' extension x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/libxml2 -I/home/marco/crawlscrape/env/build/lxml/src/lxml/includes -I/usr/include/python3.4m -I/home/marco/crawlscrape/env/include/python3.4m -c src/lxml/lxml.etree.c -o build/temp.linux-x86_64-3.4/src/lxml/lxml.etree.o -w x86_64-linux-gnu-gcc -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 build/temp.linux-x86_64-3.4/src/lxml/lxml.etree.o -lxslt -lexslt -lxml2 -lz -lm -o build/lib.linux-x86_64-3.4/lxml/etree.cpython-34m.so building 'lxml.objectify' extension x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/libxml2 -I/home/marco/crawlscrape/env/build/lxml/src/lxml/includes -I/usr/include/python3.4m -I/home/marco/crawlscrape/env/include/python3.4m -c src/lxml/lxml.objectify.c -o build/temp.linux-x86_64-3.4/src/lxml/lxml.objectify.o -w x86_64-linux-gnu-gcc -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 build/temp.linux-x86_64-3.4/src/lxml/lxml.objectify.o -lxslt -lexslt -lxml2 -lz -lm -o build/lib.linux-x86_64-3.4/lxml/objectify.cpython-34m.so /usr/lib/python3.4/distutils/dist.py:260: UserWarning: Unknown distribution option: 'bugtrack_url' warnings.warn(msg)

Successfully installed GoogleScraper lxml Cleaning up...

but again executing: (env)marco@pc:~/crawlscrape$ time ./google_scraper_example.py 'basic' 2015-02-21 13:15:18,117 - GoogleScraper - INFO - Going to scrape 1 keywords with 1 proxies by using 1 threads. 2015-02-21 13:15:18,117 - GoogleScraper - INFO - [+] SelScrape[localhost][search-type:normal][http://yandex.ru/yandsearch?] using search engine "yandex". Num keywords=1, num pages for keyword=1 2015-02-21 13:16:19,182 - GoogleScraper - ERROR - Message: unknown error: Chrome failed to start: exited abnormally (Driver info: chromedriver=2.13.307649 (bf55b442bb6b5c923249dd7870d6a107678bfbb6),platform=Linux 3.13.0-32-generic x86_64)

Exception in thread Thread-2: Traceback (most recent call last): File "/usr/lib/python3.4/threading.py", line 920, in _bootstrap_inner self.run() File "/usr/local/lib/python3.4/dist-packages/GoogleScraper/selenium_mode.py", line 419, in run raise SeleniumMisconfigurationError('Aborting due to no available selenium webdriver.') GoogleScraper.scraping.SeleniumMisconfigurationError: Aborting due to no available selenium webdriver.

real 1m1.514s user 0m0.911s sys 0m0.338s

So..any more specific options? Looking forward to your kind help.

Marco

2015-02-04 19:08 GMT+01:00 NikolaiT notifications@github.com:

Closed #78 https://github.com/NikolaiT/GoogleScraper/issues/78.

— Reply to this email directly or view it on GitHub https://github.com/NikolaiT/GoogleScraper/issues/78#event-230223943.