NikolaiT / GoogleScraper

A Python module to scrape several search engines (like Google, Yandex, Bing, Duckduckgo, ...). Including asynchronous networking support.
https://scrapeulous.com/
Apache License 2.0
2.63k stars 736 forks source link

AttributeError: ‘ScraperSearch’ object has no attribute ‘query’ #77

Open marcoippolito opened 9 years ago

marcoippolito commented 9 years ago

After re-installing GoogleScraper, this time in virtualenv, now the error message is changed, and it seems to be related to googlescraper itself.

I run the following script:

!/usr/bin/python3.4

-- coding: utf-8 --

https://github.com/NikolaiT/GoogleScraper

Shows how to control GoogleScraper programmatically

import sys import GoogleScraper from GoogleScraper import scrape_with_config, GoogleSearchError from GoogleScraper.database import ScraperSearch, SERP, Link

EXAMPLES OF HOW TO USE GoogleScraper

very basic usage

def basic_usage():

See in the config.cfg file for possible values

config = { ‘SCRAPING': { ‘use_own_ip': ‘True’, ‘keyword': ‘Let\’s go bubbles!’, ‘search_engines': ‘yandex’, ‘num_pages_for_keyword': 1 }, ‘SELENIUM': { ‘sel_browser': ‘chrome’, }, ‘GLOBAL': { ‘do_caching': ‘False’ } }

try: sqlalchemy_session = scrape_with_config(config) except GoogleSearchError as e: print(e)

let’s inspect what we got

for search in sqlalchemy_session.query(ScraperSearch).all(): for serp in search.serps: print(serp) for link in serp.links: print(link)

simulating a image search for all search engines that support image search

then download all found images :)

MAIN FUNCTION

if name == ‘main':

usage = ‘Usage: {} [basic|image-search]’.format(sys.argv[0]) if len(sys.argv) != 2: print(usage) else: arg = sys.argv[1] if arg == ‘basic': basic_usage() elif arg == ‘image': image_search() else: print(usage)

and the output is:

time python3 google_scraper_example.py ‘basic’ 2015-02-02 15:43:24,708 – GoogleScraper – INFO – Going to scrape 1 keywords with 1 proxies by using 1 threads. 2015-02-02 15:43:24,709 – GoogleScraper – INFO – [+] SelScrape[localhost][search-type:normal][http://yandex.ru/yandsearch?] using search engine “yandex”. Num keywords=1, num pages for keyword=[1] 2015-02-02 15:44:25,763 – GoogleScraper – ERROR – Message: unknown error: Chrome failed to start: exited abnormally (Driver info: chromedriver=2.13.307649 (bf55b442bb6b5c923249dd7870d6a107678bfbb6),platform=Linux 3.13.0-32-generic x86_64)

Exception in thread [yandex]SelScrape: Traceback (most recent call last): File “/usr/lib/python3.4/threading.py”, line 920, in _bootstrap_inner self.run() File “/home/marco/crawlscrape/env/lib/python3.4/site-packages/GoogleScraper/selenium_mode.py”, line 494, in run raise_or_log(‘{}: Aborting due to no available selenium webdriver.’.format(self.name), exception_obj=SeleniumMisconfigurationError) File “/home/marco/crawlscrape/env/lib/python3.4/site-packages/GoogleScraper/log.py”, line 30, in raise_or_log raise exception_obj(msg) GoogleScraper.scraping.SeleniumMisconfigurationError: [yandex]SelScrape: Aborting due to no available selenium webdriver.

Traceback (most recent call last): File “google_scraper_example.py”, line 63, in basic_usage() File “google_scraper_example.py”, line 41, in basic_usage for search in sqlalchemy_session.query(ScraperSearch).all(): AttributeError: ‘ScraperSearch’ object has no attribute ‘query’

Any suggestions Nikolai?

leadscloud commented 9 years ago

remove usage file code:

print(serp)

the version have some bugs, the serp have no query attribute if read from cache file.

marcoippolito commented 9 years ago

I'm not sure if I understood your kind suggestion. I commented (making them inactive) the following lines:

for search in sqlalchemy_session.query(ScraperSearch).all():

for serp in search.serps:

print(serp)

for link in serp.links:

print(link)

time python3 google_scraper_example.py 'basic' 2015-02-03 10:07:32,754 - GoogleScraper - INFO - Going to scrape 1 keywords with 1 proxies by using 1 threads. 2015-02-03 10:07:32,754 - GoogleScraper - INFO - [+] SelScrape[localhost][search-type:normal][http://yandex.ru/yandsearch?] using search engine "yandex". Num keywords=1, num pages for keyword=1 2015-02-03 10:08:33,843 - GoogleScraper - ERROR - Message: unknown error: Chrome failed to start: exited abnormally (Driver info: chromedriver=2.13.307649 (bf55b442bb6b5c923249dd7870d6a107678bfbb6),platform=Linux 3.13.0-32-generic x86_64)

Exception in thread Thread-2: Traceback (most recent call last): File "/usr/lib/python3.4/threading.py", line 920, in _bootstrap_inner self.run() File "/usr/local/lib/python3.4/dist-packages/GoogleScraper/selenium_mode.py", line 419, in run raise SeleniumMisconfigurationError('Aborting due to no available selenium webdriver.') GoogleScraper.scraping.SeleniumMisconfigurationError: Aborting due to no available selenium webdriver.

leadscloud commented 9 years ago

i have no help. i use win7 not linux. your should search the problem about chromedriver run on linux.

maybe your chromedriver version is not support.

stasmix commented 9 years ago

The same problem on Win 8.1 x64

stasmix commented 9 years ago

Try modify GoogleScrapper/core.py last line

if return_results: return scraper_search

should be

if return_results: return session

hope this helps

prashant-puri commented 9 years ago

Hey i am getting this error Traceback (most recent call last): File "test.py", line 137, in basic_usage() File "test.py", line 39, in basic_usage for search in sqlalchemy_session.query(ScraperSearch).all(): AttributeError: 'ScraperSearch' object has no attribute 'query'

prashant-puri commented 9 years ago

hey got the answer with this Try modify GoogleScrapper/core.py last line

if return_results: return scraper_search

should be

if return_results: return session

mavverick commented 8 years ago

Hi , I have this issue when I try to run an script. Do you know why?

'scrape_method': 'http', 'search_engine_name': 'google', 'status': 'successful'} 2016-02-09 10:43:07,836 - GoogleScraper.caching - INFO - 2 cache files found in .scrapecache/ 2016-02-09 10:43:07,837 - GoogleScraper.caching - INFO - 1/1 objects have been read from the cache. 0 remain to get scraped. Traceback (most recent call last): File "test.py", line 142, in basic_usage() File "test.py", line 44, in basic_usage for search in sqlalchemy_session.query(ScraperSearch).all(): AttributeError: 'ScraperSearch' object has no attribute 'query'

prashant-puri commented 8 years ago

hey mavverick Modify GoogleScrapper/core.py file, In that change "return scraper_search" to "return session"