NikolaiT / GoogleScraper

A Python module to scrape several search engines (like Google, Yandex, Bing, Duckduckgo, ...). Including asynchronous networking support.
https://scrapeulous.com/
Apache License 2.0
2.6k stars 734 forks source link

Errors while scraping #220

Closed krawez closed 5 years ago

krawez commented 5 years ago

Hi! When scraping, for example, the basic example GoogleScraper -m http -p 1 -n 25 -q "white light" from the explanation I get the following exception raised: Ps. Am I doing something wrong or have something not installed or is it a bug in the code?

[MainThread] - 2018-09-09 19:02:20,852 - GoogleScraper.core - INFO - Going to scrape 1 keywords with 1 proxies by using 1 threads. [MainThread] - 2018-09-09 19:02:20,853 - GoogleScraper.scraping - INFO - [+] HttpScrape[localhost][search-type:normal][https://www.google.com/search?] using search engine "google". Num keywords=1, num pages for keyword=[1] [Thread-2] - 2018-09-09 19:02:24,855 - GoogleScraper.scraping - INFO - [HttpScrape-google][localhost]]Keyword: "white light" with [1] pages, slept 4 seconds before scraping. 1/1 already scraped. Exception in thread Thread-2: Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1193, in _execute_context context) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 509, in do_execute cursor.execute(statement, parameters) sqlite3.OperationalError: table link has no column named rating

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/threading.py", line 917, in _bootstrap_inner self.run() File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/GoogleScraper/http_mode.py", line 305, in run if not self.search(rand=True): File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/GoogleScraper/http_mode.py", line 293, in search super().after_search() File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/GoogleScraper/scraping.py", line 390, in after_search if not self.store(): File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/GoogleScraper/scraping.py", line 298, in store self.session.commit() File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 943, in commit self.transaction.commit() File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 467, in commit self._prepare_impl() File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 447, in _prepare_impl self.session.flush() File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 2254, in flush self._flush(objects) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 2380, in _flush transaction.rollback(_capture_exception=True) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/sqlalchemy/util/langhelpers.py", line 66, in exit compat.reraise(exc_type, exc_value, exc_tb) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 249, in reraise raise value File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 2344, in _flush flush_context.execute() File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/sqlalchemy/orm/unitofwork.py", line 391, in execute rec.execute(self) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/sqlalchemy/orm/unitofwork.py", line 556, in execute uow File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/sqlalchemy/orm/persistence.py", line 181, in save_obj mapper, table, insert) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/sqlalchemy/orm/persistence.py", line 866, in _emit_insert_statements execute(statement, params) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 948, in execute return meth(self, multiparams, params) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/sqlalchemy/sql/elements.py", line 269, in _execute_on_connection return connection._execute_clauseelement(self, multiparams, params) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1060, in _execute_clauseelement compiled_sql, distilled_params File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1200, in _execute_context context) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1413, in _handle_dbapi_exception exc_info File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 265, in raise_from_cause reraise(type(exception), exception, tb=exc_tb, cause=cause) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 248, in reraise raise value.with_traceback(tb) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1193, in _execute_context context) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 509, in do_execute cursor.execute(statement, parameters) sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) table link has no column named rating [SQL: 'INSERT INTO link (title, snippet, link, domain, visible_link, rating, num_reviews, rank, link_type, serp_id) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)'] [parameters: ('George Michael - White Light - YouTube', None, 'https://www.youtube.com/watch?v=SRAOG-BpNOw', 'www.youtube.com', 'https://www.youtube.com/watch?v=SRAOG-BpNOw', None, None, 1, 'results', 3)] (Background on this error at: http://sqlalche.me/e/e3q8)

Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/3.7/bin/GoogleScraper", line 11, in load_entry_point('GoogleScraper==0.2.2', 'console_scripts', 'GoogleScraper')() File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/GoogleScraper/core.py", line 471, in main session.commit() File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 943, in commit self.transaction.commit() File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 465, in commit self._assert_active(prepared_ok=True) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 276, in _assert_active % self._rollback_exception sqlalchemy.exc.InvalidRequestError: This Session's transaction has been rolled back due to a previous exception during flush. To begin a new transaction with this Session, first issue Session.rollback(). Original exception was: (sqlite3.OperationalError) table link has no column named rating [SQL: 'INSERT INTO link (title, snippet, link, domain, visible_link, rating, num_reviews, rank, link_type, serp_id) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)'] [parameters: ('George Michael - White Light - YouTube', None, 'https://www.youtube.com/watch?v=SRAOG-BpNOw', 'www.youtube.com', 'https://www.youtube.com/watch?v=SRAOG-BpNOw', None, None, 1, 'results', 3)] (Background on this error at: http://sqlalche.me/e/e3q8)

Desktop (please complete the following information):

NikolaiT commented 5 years ago

This is because I changed the database schema.

Try

GoogleScraper --clean

or delete the database

rm google_scraper.db

and it should work again.

krawez commented 5 years ago

Aa thank you very much 👍