disinfoRG / ZeroScraper

Web scraper made by 0archive.
https://0archive.tw
MIT License
10 stars 2 forks source link

Too many connections error #35

Open andreawwenyi opened 4 years ago

andreawwenyi commented 4 years ago

Too many connections error when doing python3 execute_spider.py -d -site_id xxx Could get rid of error if close mysql connection and restart. The error is suspected raised from too many unclosed connections during discover process.

Error messages:

2020-01-09 12:48:48 [scrapy.utils.log] INFO: Scrapy 1.8.0 started (bot: newsSpiders)
2020-01-09 12:48:48 [scrapy.utils.log] INFO: Versions: lxml 4.4.2.0, libxml2 2.9.10, cssselect 1.1.0, parsel 1.5.2, w3lib 1.21.0, Twisted 19.10.0, Python 3.7.4 (default, Aug 13 2019, 15:17:50) - [Clang 4.0.1 (tags/RELEASE_401/final)], pyOpenSSL 19.1.0 (OpenSSL 1.1.1d  10 Sep 2019), cryptography 2.8, Platform Darwin-19.0.0-x86_64-i386-64bit
2020-01-09 12:48:48 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'newsSpiders', 'DEPTH_LIMIT': 5, 'DOWNLOAD_DELAY': 1.5, 'NEWSPIDER_MODULE': 'newsSpiders.spiders', 'SPIDER_MODULES': ['newsSpiders.spiders'], 'USER_AGENT': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36'}
2020-01-09 12:48:48 [scrapy.extensions.telnet] INFO: Telnet Password: 3a32bf798c56ea3a
2020-01-09 12:48:48 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
 'scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.memusage.MemoryUsage',
 'scrapy.extensions.logstats.LogStats']
Unhandled error in Deferred:
2020-01-09 12:48:48 [twisted] CRITICAL: Unhandled error in Deferred:

Traceback (most recent call last):
  File "/Users/wyw/.local/share/virtualenvs/NewsScraping-I6zyEuYv/lib/python3.7/site-packages/scrapy/crawler.py", line 184, in crawl
    return self._crawl(crawler, *args, **kwargs)
  File "/Users/wyw/.local/share/virtualenvs/NewsScraping-I6zyEuYv/lib/python3.7/site-packages/scrapy/crawler.py", line 188, in _crawl
    d = crawler.crawl(*args, **kwargs)
  File "/Users/wyw/.local/share/virtualenvs/NewsScraping-I6zyEuYv/lib/python3.7/site-packages/twisted/internet/defer.py", line 1613, in unwindGenerator
    return _cancellableInlineCallbacks(gen)
  File "/Users/wyw/.local/share/virtualenvs/NewsScraping-I6zyEuYv/lib/python3.7/site-packages/twisted/internet/defer.py", line 1529, in _cancellableInlineCallbacks
    _inlineCallbacks(None, g, status)
--- <exception caught here> ---
  File "/Users/wyw/.local/share/virtualenvs/NewsScraping-I6zyEuYv/lib/python3.7/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
    result = g.send(result)
  File "/Users/wyw/.local/share/virtualenvs/NewsScraping-I6zyEuYv/lib/python3.7/site-packages/scrapy/crawler.py", line 85, in crawl
    self.spider = self._create_spider(*args, **kwargs)
  File "/Users/wyw/.local/share/virtualenvs/NewsScraping-I6zyEuYv/lib/python3.7/site-packages/scrapy/crawler.py", line 108, in _create_spider
    return self.spidercls.from_crawler(self, *args, **kwargs)
  File "/Users/wyw/.local/share/virtualenvs/NewsScraping-I6zyEuYv/lib/python3.7/site-packages/scrapy/spiders/crawl.py", line 122, in from_crawler
    spider = super(CrawlSpider, cls).from_crawler(crawler, *args, **kwargs)
  File "/Users/wyw/.local/share/virtualenvs/NewsScraping-I6zyEuYv/lib/python3.7/site-packages/scrapy/spiders/__init__.py", line 50, in from_crawler
    spider = cls(*args, **kwargs)
  File "/Users/wyw/Codes/NewsScraping/codes/newsSpiders/spiders/discover_article_spider.py", line 35, in __init__
    engine, connection, tables = connect_to_db()
  File "/Users/wyw/Codes/NewsScraping/codes/helpers.py", line 14, in connect_to_db
    metadata.reflect(bind=engine)
  File "/Users/wyw/.local/share/virtualenvs/NewsScraping-I6zyEuYv/lib/python3.7/site-packages/sqlalchemy/sql/schema.py", line 4208, in reflect
    with bind.connect() as conn:
  File "/Users/wyw/.local/share/virtualenvs/NewsScraping-I6zyEuYv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 2209, in connect
    return self._connection_cls(self, **kwargs)
  File "/Users/wyw/.local/share/virtualenvs/NewsScraping-I6zyEuYv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 103, in __init__
    else engine.raw_connection()
  File "/Users/wyw/.local/share/virtualenvs/NewsScraping-I6zyEuYv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 2307, in raw_connection
    self.pool.unique_connection, _connection
  File "/Users/wyw/.local/share/virtualenvs/NewsScraping-I6zyEuYv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 2280, in _wrap_pool_connect
    e, dialect, self
  File "/Users/wyw/.local/share/virtualenvs/NewsScraping-I6zyEuYv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1547, in _handle_dbapi_exception_noconnection
    util.raise_from_cause(sqlalchemy_exception, exc_info)
  File "/Users/wyw/.local/share/virtualenvs/NewsScraping-I6zyEuYv/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 398, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb, cause=cause)
  File "/Users/wyw/.local/share/virtualenvs/NewsScraping-I6zyEuYv/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 152, in reraise
    raise value.with_traceback(tb)
  File "/Users/wyw/.local/share/virtualenvs/NewsScraping-I6zyEuYv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 2276, in _wrap_pool_connect
    return fn()
  File "/Users/wyw/.local/share/virtualenvs/NewsScraping-I6zyEuYv/lib/python3.7/site-packages/sqlalchemy/pool/base.py", line 303, in unique_connection
    return _ConnectionFairy._checkout(self)
  File "/Users/wyw/.local/share/virtualenvs/NewsScraping-I6zyEuYv/lib/python3.7/site-packages/sqlalchemy/pool/base.py", line 760, in _checkout
    fairy = _ConnectionRecord.checkout(pool)
  File "/Users/wyw/.local/share/virtualenvs/NewsScraping-I6zyEuYv/lib/python3.7/site-packages/sqlalchemy/pool/base.py", line 492, in checkout
    rec = pool._do_get()
  File "/Users/wyw/.local/share/virtualenvs/NewsScraping-I6zyEuYv/lib/python3.7/site-packages/sqlalchemy/pool/impl.py", line 139, in _do_get
    self._dec_overflow()
  File "/Users/wyw/.local/share/virtualenvs/NewsScraping-I6zyEuYv/lib/python3.7/site-packages/sqlalchemy/util/langhelpers.py", line 68, in __exit__
    compat.reraise(exc_type, exc_value, exc_tb)
  File "/Users/wyw/.local/share/virtualenvs/NewsScraping-I6zyEuYv/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 153, in reraise
    raise value
  File "/Users/wyw/.local/share/virtualenvs/NewsScraping-I6zyEuYv/lib/python3.7/site-packages/sqlalchemy/pool/impl.py", line 136, in _do_get
    return self._create_connection()
  File "/Users/wyw/.local/share/virtualenvs/NewsScraping-I6zyEuYv/lib/python3.7/site-packages/sqlalchemy/pool/base.py", line 308, in _create_connection
    return _ConnectionRecord(self)
  File "/Users/wyw/.local/share/virtualenvs/NewsScraping-I6zyEuYv/lib/python3.7/site-packages/sqlalchemy/pool/base.py", line 437, in __init__
    self.__connect(first_connect_check=True)
  File "/Users/wyw/.local/share/virtualenvs/NewsScraping-I6zyEuYv/lib/python3.7/site-packages/sqlalchemy/pool/base.py", line 639, in __connect
    connection = pool._invoke_creator(self)
  File "/Users/wyw/.local/share/virtualenvs/NewsScraping-I6zyEuYv/lib/python3.7/site-packages/sqlalchemy/engine/strategies.py", line 114, in connect
    return dialect.connect(*cargs, **cparams)
  File "/Users/wyw/.local/share/virtualenvs/NewsScraping-I6zyEuYv/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 482, in connect
    return self.dbapi.connect(*cargs, **cparams)
  File "/Users/wyw/.local/share/virtualenvs/NewsScraping-I6zyEuYv/lib/python3.7/site-packages/pymysql/__init__.py", line 94, in Connect
    return Connection(*args, **kwargs)
  File "/Users/wyw/.local/share/virtualenvs/NewsScraping-I6zyEuYv/lib/python3.7/site-packages/pymysql/connections.py", line 325, in __init__
    self.connect()
  File "/Users/wyw/.local/share/virtualenvs/NewsScraping-I6zyEuYv/lib/python3.7/site-packages/pymysql/connections.py", line 598, in connect
    self._get_server_information()
  File "/Users/wyw/.local/share/virtualenvs/NewsScraping-I6zyEuYv/lib/python3.7/site-packages/pymysql/connections.py", line 975, in _get_server_information
    packet = self._read_packet()
  File "/Users/wyw/.local/share/virtualenvs/NewsScraping-I6zyEuYv/lib/python3.7/site-packages/pymysql/connections.py", line 684, in _read_packet
    packet.check_error()
  File "/Users/wyw/.local/share/virtualenvs/NewsScraping-I6zyEuYv/lib/python3.7/site-packages/pymysql/protocol.py", line 220, in check_error
    err.raise_mysql_exception(self._data)
  File "/Users/wyw/.local/share/virtualenvs/NewsScraping-I6zyEuYv/lib/python3.7/site-packages/pymysql/err.py", line 109, in raise_mysql_exception
    raise errorclass(errno, errval)
sqlalchemy.exc.OperationalError: (pymysql.err.OperationalError) (1040, 'Too many connections')
(Background on this error at: http://sqlalche.me/e/e3q8)

2020-01-09 12:48:48 [twisted] CRITICAL: 
Traceback (most recent call last):
  File "/Users/wyw/.local/share/virtualenvs/NewsScraping-I6zyEuYv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 2276, in _wrap_pool_connect
    return fn()
  File "/Users/wyw/.local/share/virtualenvs/NewsScraping-I6zyEuYv/lib/python3.7/site-packages/sqlalchemy/pool/base.py", line 303, in unique_connection
    return _ConnectionFairy._checkout(self)
  File "/Users/wyw/.local/share/virtualenvs/NewsScraping-I6zyEuYv/lib/python3.7/site-packages/sqlalchemy/pool/base.py", line 760, in _checkout
    fairy = _ConnectionRecord.checkout(pool)
  File "/Users/wyw/.local/share/virtualenvs/NewsScraping-I6zyEuYv/lib/python3.7/site-packages/sqlalchemy/pool/base.py", line 492, in checkout
    rec = pool._do_get()
  File "/Users/wyw/.local/share/virtualenvs/NewsScraping-I6zyEuYv/lib/python3.7/site-packages/sqlalchemy/pool/impl.py", line 139, in _do_get
    self._dec_overflow()
  File "/Users/wyw/.local/share/virtualenvs/NewsScraping-I6zyEuYv/lib/python3.7/site-packages/sqlalchemy/util/langhelpers.py", line 68, in __exit__
    compat.reraise(exc_type, exc_value, exc_tb)
  File "/Users/wyw/.local/share/virtualenvs/NewsScraping-I6zyEuYv/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 153, in reraise
    raise value
  File "/Users/wyw/.local/share/virtualenvs/NewsScraping-I6zyEuYv/lib/python3.7/site-packages/sqlalchemy/pool/impl.py", line 136, in _do_get
    return self._create_connection()
  File "/Users/wyw/.local/share/virtualenvs/NewsScraping-I6zyEuYv/lib/python3.7/site-packages/sqlalchemy/pool/base.py", line 308, in _create_connection
    return _ConnectionRecord(self)
  File "/Users/wyw/.local/share/virtualenvs/NewsScraping-I6zyEuYv/lib/python3.7/site-packages/sqlalchemy/pool/base.py", line 437, in __init__
    self.__connect(first_connect_check=True)
  File "/Users/wyw/.local/share/virtualenvs/NewsScraping-I6zyEuYv/lib/python3.7/site-packages/sqlalchemy/pool/base.py", line 639, in __connect
    connection = pool._invoke_creator(self)
  File "/Users/wyw/.local/share/virtualenvs/NewsScraping-I6zyEuYv/lib/python3.7/site-packages/sqlalchemy/engine/strategies.py", line 114, in connect
    return dialect.connect(*cargs, **cparams)
  File "/Users/wyw/.local/share/virtualenvs/NewsScraping-I6zyEuYv/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 482, in connect
    return self.dbapi.connect(*cargs, **cparams)
  File "/Users/wyw/.local/share/virtualenvs/NewsScraping-I6zyEuYv/lib/python3.7/site-packages/pymysql/__init__.py", line 94, in Connect
    return Connection(*args, **kwargs)
  File "/Users/wyw/.local/share/virtualenvs/NewsScraping-I6zyEuYv/lib/python3.7/site-packages/pymysql/connections.py", line 325, in __init__
    self.connect()
  File "/Users/wyw/.local/share/virtualenvs/NewsScraping-I6zyEuYv/lib/python3.7/site-packages/pymysql/connections.py", line 598, in connect
    self._get_server_information()
  File "/Users/wyw/.local/share/virtualenvs/NewsScraping-I6zyEuYv/lib/python3.7/site-packages/pymysql/connections.py", line 975, in _get_server_information
    packet = self._read_packet()
  File "/Users/wyw/.local/share/virtualenvs/NewsScraping-I6zyEuYv/lib/python3.7/site-packages/pymysql/connections.py", line 684, in _read_packet
    packet.check_error()
  File "/Users/wyw/.local/share/virtualenvs/NewsScraping-I6zyEuYv/lib/python3.7/site-packages/pymysql/protocol.py", line 220, in check_error
    err.raise_mysql_exception(self._data)
  File "/Users/wyw/.local/share/virtualenvs/NewsScraping-I6zyEuYv/lib/python3.7/site-packages/pymysql/err.py", line 109, in raise_mysql_exception
    raise errorclass(errno, errval)
pymysql.err.OperationalError: (1040, 'Too many connections')

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/wyw/.local/share/virtualenvs/NewsScraping-I6zyEuYv/lib/python3.7/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
    result = g.send(result)
  File "/Users/wyw/.local/share/virtualenvs/NewsScraping-I6zyEuYv/lib/python3.7/site-packages/scrapy/crawler.py", line 85, in crawl
    self.spider = self._create_spider(*args, **kwargs)
  File "/Users/wyw/.local/share/virtualenvs/NewsScraping-I6zyEuYv/lib/python3.7/site-packages/scrapy/crawler.py", line 108, in _create_spider
    return self.spidercls.from_crawler(self, *args, **kwargs)
  File "/Users/wyw/.local/share/virtualenvs/NewsScraping-I6zyEuYv/lib/python3.7/site-packages/scrapy/spiders/crawl.py", line 122, in from_crawler
    spider = super(CrawlSpider, cls).from_crawler(crawler, *args, **kwargs)
  File "/Users/wyw/.local/share/virtualenvs/NewsScraping-I6zyEuYv/lib/python3.7/site-packages/scrapy/spiders/__init__.py", line 50, in from_crawler
    spider = cls(*args, **kwargs)
  File "/Users/wyw/Codes/NewsScraping/codes/newsSpiders/spiders/discover_article_spider.py", line 35, in __init__
    engine, connection, tables = connect_to_db()
  File "/Users/wyw/Codes/NewsScraping/codes/helpers.py", line 14, in connect_to_db
    metadata.reflect(bind=engine)
  File "/Users/wyw/.local/share/virtualenvs/NewsScraping-I6zyEuYv/lib/python3.7/site-packages/sqlalchemy/sql/schema.py", line 4208, in reflect
    with bind.connect() as conn:
  File "/Users/wyw/.local/share/virtualenvs/NewsScraping-I6zyEuYv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 2209, in connect
    return self._connection_cls(self, **kwargs)
  File "/Users/wyw/.local/share/virtualenvs/NewsScraping-I6zyEuYv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 103, in __init__
    else engine.raw_connection()
  File "/Users/wyw/.local/share/virtualenvs/NewsScraping-I6zyEuYv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 2307, in raw_connection
    self.pool.unique_connection, _connection
  File "/Users/wyw/.local/share/virtualenvs/NewsScraping-I6zyEuYv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 2280, in _wrap_pool_connect
    e, dialect, self
  File "/Users/wyw/.local/share/virtualenvs/NewsScraping-I6zyEuYv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1547, in _handle_dbapi_exception_noconnection
    util.raise_from_cause(sqlalchemy_exception, exc_info)
  File "/Users/wyw/.local/share/virtualenvs/NewsScraping-I6zyEuYv/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 398, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb, cause=cause)
  File "/Users/wyw/.local/share/virtualenvs/NewsScraping-I6zyEuYv/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 152, in reraise
    raise value.with_traceback(tb)
  File "/Users/wyw/.local/share/virtualenvs/NewsScraping-I6zyEuYv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 2276, in _wrap_pool_connect
    return fn()
  File "/Users/wyw/.local/share/virtualenvs/NewsScraping-I6zyEuYv/lib/python3.7/site-packages/sqlalchemy/pool/base.py", line 303, in unique_connection
    return _ConnectionFairy._checkout(self)
  File "/Users/wyw/.local/share/virtualenvs/NewsScraping-I6zyEuYv/lib/python3.7/site-packages/sqlalchemy/pool/base.py", line 760, in _checkout
    fairy = _ConnectionRecord.checkout(pool)
  File "/Users/wyw/.local/share/virtualenvs/NewsScraping-I6zyEuYv/lib/python3.7/site-packages/sqlalchemy/pool/base.py", line 492, in checkout
    rec = pool._do_get()
  File "/Users/wyw/.local/share/virtualenvs/NewsScraping-I6zyEuYv/lib/python3.7/site-packages/sqlalchemy/pool/impl.py", line 139, in _do_get
    self._dec_overflow()
  File "/Users/wyw/.local/share/virtualenvs/NewsScraping-I6zyEuYv/lib/python3.7/site-packages/sqlalchemy/util/langhelpers.py", line 68, in __exit__
    compat.reraise(exc_type, exc_value, exc_tb)
  File "/Users/wyw/.local/share/virtualenvs/NewsScraping-I6zyEuYv/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 153, in reraise
    raise value
  File "/Users/wyw/.local/share/virtualenvs/NewsScraping-I6zyEuYv/lib/python3.7/site-packages/sqlalchemy/pool/impl.py", line 136, in _do_get
    return self._create_connection()
  File "/Users/wyw/.local/share/virtualenvs/NewsScraping-I6zyEuYv/lib/python3.7/site-packages/sqlalchemy/pool/base.py", line 308, in _create_connection
    return _ConnectionRecord(self)
  File "/Users/wyw/.local/share/virtualenvs/NewsScraping-I6zyEuYv/lib/python3.7/site-packages/sqlalchemy/pool/base.py", line 437, in __init__
    self.__connect(first_connect_check=True)
  File "/Users/wyw/.local/share/virtualenvs/NewsScraping-I6zyEuYv/lib/python3.7/site-packages/sqlalchemy/pool/base.py", line 639, in __connect
    connection = pool._invoke_creator(self)
  File "/Users/wyw/.local/share/virtualenvs/NewsScraping-I6zyEuYv/lib/python3.7/site-packages/sqlalchemy/engine/strategies.py", line 114, in connect
    return dialect.connect(*cargs, **cparams)
  File "/Users/wyw/.local/share/virtualenvs/NewsScraping-I6zyEuYv/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 482, in connect
    return self.dbapi.connect(*cargs, **cparams)
  File "/Users/wyw/.local/share/virtualenvs/NewsScraping-I6zyEuYv/lib/python3.7/site-packages/pymysql/__init__.py", line 94, in Connect
    return Connection(*args, **kwargs)
  File "/Users/wyw/.local/share/virtualenvs/NewsScraping-I6zyEuYv/lib/python3.7/site-packages/pymysql/connections.py", line 325, in __init__
    self.connect()
  File "/Users/wyw/.local/share/virtualenvs/NewsScraping-I6zyEuYv/lib/python3.7/site-packages/pymysql/connections.py", line 598, in connect
    self._get_server_information()
  File "/Users/wyw/.local/share/virtualenvs/NewsScraping-I6zyEuYv/lib/python3.7/site-packages/pymysql/connections.py", line 975, in _get_server_information
    packet = self._read_packet()
  File "/Users/wyw/.local/share/virtualenvs/NewsScraping-I6zyEuYv/lib/python3.7/site-packages/pymysql/connections.py", line 684, in _read_packet
    packet.check_error()
  File "/Users/wyw/.local/share/virtualenvs/NewsScraping-I6zyEuYv/lib/python3.7/site-packages/pymysql/protocol.py", line 220, in check_error
    err.raise_mysql_exception(self._data)
  File "/Users/wyw/.local/share/virtualenvs/NewsScraping-I6zyEuYv/lib/python3.7/site-packages/pymysql/err.py", line 109, in raise_mysql_exception
    raise errorclass(errno, errval)
sqlalchemy.exc.OperationalError: (pymysql.err.OperationalError) (1040, 'Too many connections')
(Background on this error at: http://sqlalche.me/e/e3q8)
pm5 commented 4 years ago

I think this is fixed by #36, which moves db queries to pipeline, which is shared by all spiders. It happened to me before, but I do not have ways to reproduce the problem, so I cannot verify if it is actually fixed.

andreawwenyi commented 4 years ago

This is further improved by commit d1733bf27d4041354c589436652d32f7fb34f570