Closed kmike closed 8 years ago
I havent' started autologin servers and got these exceptions:
py35 runtests: commands[4] | py.test --doctest-modules --cov=undercrawler undercrawler tests ============================================================= test session starts ============================================================= platform darwin -- Python 3.5.1, pytest-2.9.1, py-1.4.31, pluggy-0.3.1 rootdir: /Users/kmike/svn/undercrawler, inifile: plugins: cov-2.2.1, twisted-1.5 collected 24 items undercrawler/spiders/base_spider.py .. tests/test_dupe_predict.py ................. tests/test_spider.py ...FF ----------------------------------------------- coverage: platform darwin, python 3.5.1-final-0 ----------------------------------------------- Name Stmts Miss Branch BrPart Cover -------------------------------------------------------------------------------- undercrawler/__init__.py 0 0 0 0 100% undercrawler/crazy_form_submitter.py 41 31 21 0 16% undercrawler/directives/test_directive.py 71 60 14 1 14% undercrawler/documents_pipeline.py 20 2 10 4 80% undercrawler/dupe_predict.py 145 1 78 2 99% undercrawler/items.py 18 0 2 0 100% undercrawler/middleware/__init__.py 3 0 0 0 100% undercrawler/middleware/autologin.py 88 46 42 7 39% undercrawler/middleware/avoid_dup_content.py 42 20 18 4 43% undercrawler/middleware/throttle.py 24 1 10 3 88% undercrawler/settings.py 33 0 2 1 97% undercrawler/spiders/__init__.py 1 0 0 0 100% undercrawler/spiders/base_spider.py 150 25 63 11 78% undercrawler/utils.py 39 10 22 2 64% -------------------------------------------------------------------------------- TOTAL 675 196 282 35 67% ================================================================== FAILURES =================================================================== __________________________________________________________ TestAutologin.test_login ___________________________________________________________ self = <tests.test_spider.TestAutologin testMethod=test_login> @defer.inlineCallbacks def test_login(self): ''' No logout links, just one page after login. ''' with MockServer(Login) as s: root_url = s.root_url yield self.crawler.crawl(url=root_url) spider = self.crawler.spider > assert hasattr(spider, 'collected_items') E AssertionError: assert hasattr(<BaseSpider 'base' at 0x10ce7ee80>, 'collected_items') tests/test_spider.py:224: AssertionError ------------------------------------------------------------ Captured stderr call ------------------------------------------------------------- INFO:scrapy.middleware:Enabled extensions: ['scrapy.extensions.corestats.CoreStats', 'scrapy.extensions.logstats.LogStats'] INFO: Enabled extensions: ['scrapy.extensions.corestats.CoreStats', 'scrapy.extensions.logstats.LogStats'] INFO:scrapy.middleware:Enabled downloader middlewares: ['undercrawler.middleware.AvoidDupContentMiddleware', 'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', 'scrapy.downloadermiddlewares.retry.RetryMiddleware', 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', 'undercrawler.middleware.AutologinMiddleware', 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', 'undercrawler.middleware.SplashAwareAutoThrottle', 'scrapy_splash.SplashCookiesMiddleware', 'scrapy_splash.SplashMiddleware', 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', 'scrapy.downloadermiddlewares.chunked.ChunkedTransferMiddleware', 'scrapy.downloadermiddlewares.stats.DownloaderStats'] INFO: Enabled downloader middlewares: ['undercrawler.middleware.AvoidDupContentMiddleware', 'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', 'scrapy.downloadermiddlewares.retry.RetryMiddleware', 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', 'undercrawler.middleware.AutologinMiddleware', 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', 'undercrawler.middleware.SplashAwareAutoThrottle', 'scrapy_splash.SplashCookiesMiddleware', 'scrapy_splash.SplashMiddleware', 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', 'scrapy.downloadermiddlewares.chunked.ChunkedTransferMiddleware', 'scrapy.downloadermiddlewares.stats.DownloaderStats'] INFO:scrapy.middleware:Enabled spider middlewares: ['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', 'scrapy.spidermiddlewares.referer.RefererMiddleware', 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', 'scrapy.spidermiddlewares.depth.DepthMiddleware'] INFO: Enabled spider middlewares: ['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', 'scrapy.spidermiddlewares.referer.RefererMiddleware', 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', 'scrapy.spidermiddlewares.depth.DepthMiddleware'] INFO:scrapy.middleware:Enabled item pipelines: ['tests.utils.CollectorPipeline'] INFO: Enabled item pipelines: ['tests.utils.CollectorPipeline'] INFO:scrapy.core.engine:Spider opened INFO: Spider opened INFO:scrapy.extensions.logstats:Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) DEBUG:undercrawler.middleware.autologin:Attempting login at http://192.168.99.1:8781 DEBUG: Attempting login at http://192.168.99.1:8781 INFO:requests.packages.urllib3.connectionpool:Starting new HTTP connection (1): 127.0.0.1 INFO: Starting new HTTP connection (1): 127.0.0.1 DEBUG:scrapy.downloadermiddlewares.retry:Retrying <GET http://192.168.99.1:8781> (failed 1 times): HTTPConnectionPool(host='127.0.0.1', port=8089): Max retries exceeded with url: /login-cookies (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x10ce84cc0>: Failed to establish a new connection: [Errno 61] Connection refused',)) DEBUG: Retrying <GET http://192.168.99.1:8781> (failed 1 times): HTTPConnectionPool(host='127.0.0.1', port=8089): Max retries exceeded with url: /login-cookies (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x10ce84cc0>: Failed to establish a new connection: [Errno 61] Connection refused',)) DEBUG:undercrawler.middleware.autologin:response <200 http://192.168.99.1:8781/> cookies <CookieJar[]> DEBUG: response <200 http://192.168.99.1:8781/> cookies <CookieJar[]> ERROR:scrapy.core.scraper:Error downloading <GET http://192.168.99.1:8781 via http://192.168.99.100:8050/execute> Traceback (most recent call last): File "/Users/kmike/svn/undercrawler/.tox/py35/lib/python3.5/site-packages/twisted/internet/defer.py", line 1128, in _inlineCallbacks result = g.send(result) File "/Users/kmike/svn/undercrawler/.tox/py35/lib/python3.5/site-packages/scrapy/core/downloader/middleware.py", line 43, in process_request defer.returnValue((yield download_func(request=request,spider=spider))) File "/Users/kmike/svn/undercrawler/.tox/py35/lib/python3.5/site-packages/twisted/internet/defer.py", line 1105, in returnValue raise _DefGen_Return(val) twisted.internet.defer._DefGen_Return: <200 http://192.168.99.100:8050/execute> During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/Users/kmike/svn/undercrawler/.tox/py35/lib/python3.5/site-packages/twisted/internet/defer.py", line 1128, in _inlineCallbacks result = g.send(result) File "/Users/kmike/svn/undercrawler/.tox/py35/lib/python3.5/site-packages/scrapy/core/downloader/middleware.py", line 53, in process_response spider=spider) File "/Users/kmike/svn/undercrawler/undercrawler/middleware/autologin.py", line 129, in process_response if self.is_logout(response): File "/Users/kmike/svn/undercrawler/undercrawler/middleware/autologin.py", line 149, in is_logout auth_cookies = {c['name'] for c in self.auth_cookies if c['value']} TypeError: 'NoneType' object is not iterable ERROR: Error downloading <GET http://192.168.99.1:8781 via http://192.168.99.100:8050/execute> Traceback (most recent call last): File "/Users/kmike/svn/undercrawler/.tox/py35/lib/python3.5/site-packages/twisted/internet/defer.py", line 1128, in _inlineCallbacks result = g.send(result) File "/Users/kmike/svn/undercrawler/.tox/py35/lib/python3.5/site-packages/scrapy/core/downloader/middleware.py", line 43, in process_request defer.returnValue((yield download_func(request=request,spider=spider))) File "/Users/kmike/svn/undercrawler/.tox/py35/lib/python3.5/site-packages/twisted/internet/defer.py", line 1105, in returnValue raise _DefGen_Return(val) twisted.internet.defer._DefGen_Return: <200 http://192.168.99.100:8050/execute> During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/Users/kmike/svn/undercrawler/.tox/py35/lib/python3.5/site-packages/twisted/internet/defer.py", line 1128, in _inlineCallbacks result = g.send(result) File "/Users/kmike/svn/undercrawler/.tox/py35/lib/python3.5/site-packages/scrapy/core/downloader/middleware.py", line 53, in process_response spider=spider) File "/Users/kmike/svn/undercrawler/undercrawler/middleware/autologin.py", line 129, in process_response if self.is_logout(response): File "/Users/kmike/svn/undercrawler/undercrawler/middleware/autologin.py", line 149, in is_logout auth_cookies = {c['name'] for c in self.auth_cookies if c['value']} TypeError: 'NoneType' object is not iterable INFO:scrapy.core.engine:Closing spider (finished) INFO: Closing spider (finished) INFO:scrapy.statscollectors:Dumping Scrapy stats: {'downloader/exception_count': 1, 'downloader/exception_type_count/requests.exceptions.ConnectionError': 1, 'downloader/request_bytes': 27995, 'downloader/request_count': 1, 'downloader/request_method_count/POST': 1, 'downloader/response_bytes': 1960, 'downloader/response_count': 1, 'downloader/response_status_count/200': 1, 'finish_reason': 'finished', 'finish_time': datetime.datetime(2016, 4, 11, 19, 32, 19, 94613), 'log_count/DEBUG': 3, 'log_count/ERROR': 1, 'log_count/INFO': 8, 'scheduler/dequeued': 3, 'scheduler/dequeued/memory': 3, 'scheduler/enqueued': 3, 'scheduler/enqueued/memory': 3, 'splash/execute/request_count': 1, 'splash/execute/response_count/200': 1, 'start_time': datetime.datetime(2016, 4, 11, 19, 32, 12, 678245)} INFO: Dumping Scrapy stats: {'downloader/exception_count': 1, 'downloader/exception_type_count/requests.exceptions.ConnectionError': 1, 'downloader/request_bytes': 27995, 'downloader/request_count': 1, 'downloader/request_method_count/POST': 1, 'downloader/response_bytes': 1960, 'downloader/response_count': 1, 'downloader/response_status_count/200': 1, 'finish_reason': 'finished', 'finish_time': datetime.datetime(2016, 4, 11, 19, 32, 19, 94613), 'log_count/DEBUG': 3, 'log_count/ERROR': 1, 'log_count/INFO': 8, 'scheduler/dequeued': 3, 'scheduler/dequeued/memory': 3, 'scheduler/enqueued': 3, 'scheduler/enqueued/memory': 3, 'splash/execute/request_count': 1, 'splash/execute/response_count/200': 1, 'start_time': datetime.datetime(2016, 4, 11, 19, 32, 12, 678245)} INFO:scrapy.core.engine:Spider closed (finished) INFO: Spider closed (finished) ____________________________________________________ TestAutologin.test_login_with_logout _____________________________________________________ self = <tests.test_spider.TestAutologin testMethod=test_login_with_logout> @defer.inlineCallbacks def test_login_with_logout(self): ''' Login with logout. ''' with MockServer(LoginWithLogout) as s: root_url = s.root_url yield self.crawler.crawl(url=root_url) spider = self.crawler.spider > assert hasattr(spider, 'collected_items') E AssertionError: assert hasattr(<BaseSpider 'base' at 0x10e18d668>, 'collected_items') tests/test_spider.py:236: AssertionError ------------------------------------------------------------ Captured stderr call ------------------------------------------------------------- INFO:scrapy.middleware:Enabled extensions: ['scrapy.extensions.corestats.CoreStats', 'scrapy.extensions.logstats.LogStats'] INFO: Enabled extensions: ['scrapy.extensions.corestats.CoreStats', 'scrapy.extensions.logstats.LogStats'] INFO:scrapy.middleware:Enabled downloader middlewares: ['undercrawler.middleware.AvoidDupContentMiddleware', 'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', 'scrapy.downloadermiddlewares.retry.RetryMiddleware', 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', 'undercrawler.middleware.AutologinMiddleware', 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', 'undercrawler.middleware.SplashAwareAutoThrottle', 'scrapy_splash.SplashCookiesMiddleware', 'scrapy_splash.SplashMiddleware', 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', 'scrapy.downloadermiddlewares.chunked.ChunkedTransferMiddleware', 'scrapy.downloadermiddlewares.stats.DownloaderStats'] INFO: Enabled downloader middlewares: ['undercrawler.middleware.AvoidDupContentMiddleware', 'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', 'scrapy.downloadermiddlewares.retry.RetryMiddleware', 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', 'undercrawler.middleware.AutologinMiddleware', 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', 'undercrawler.middleware.SplashAwareAutoThrottle', 'scrapy_splash.SplashCookiesMiddleware', 'scrapy_splash.SplashMiddleware', 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', 'scrapy.downloadermiddlewares.chunked.ChunkedTransferMiddleware', 'scrapy.downloadermiddlewares.stats.DownloaderStats'] INFO:scrapy.middleware:Enabled spider middlewares: ['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', 'scrapy.spidermiddlewares.referer.RefererMiddleware', 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', 'scrapy.spidermiddlewares.depth.DepthMiddleware'] INFO: Enabled spider middlewares: ['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', 'scrapy.spidermiddlewares.referer.RefererMiddleware', 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', 'scrapy.spidermiddlewares.depth.DepthMiddleware'] INFO:scrapy.middleware:Enabled item pipelines: ['tests.utils.CollectorPipeline'] INFO: Enabled item pipelines: ['tests.utils.CollectorPipeline'] INFO:scrapy.core.engine:Spider opened INFO: Spider opened INFO:scrapy.extensions.logstats:Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) DEBUG:undercrawler.middleware.autologin:Attempting login at http://192.168.99.1:8781 DEBUG: Attempting login at http://192.168.99.1:8781 INFO:requests.packages.urllib3.connectionpool:Starting new HTTP connection (1): 127.0.0.1 INFO: Starting new HTTP connection (1): 127.0.0.1 DEBUG:scrapy.downloadermiddlewares.retry:Retrying <GET http://192.168.99.1:8781> (failed 1 times): HTTPConnectionPool(host='127.0.0.1', port=8089): Max retries exceeded with url: /login-cookies (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x10cd67080>: Failed to establish a new connection: [Errno 61] Connection refused',)) DEBUG: Retrying <GET http://192.168.99.1:8781> (failed 1 times): HTTPConnectionPool(host='127.0.0.1', port=8089): Max retries exceeded with url: /login-cookies (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x10cd67080>: Failed to establish a new connection: [Errno 61] Connection refused',)) DEBUG:undercrawler.middleware.autologin:response <200 http://192.168.99.1:8781/> cookies <CookieJar[]> DEBUG: response <200 http://192.168.99.1:8781/> cookies <CookieJar[]> ERROR:scrapy.core.scraper:Error downloading <GET http://192.168.99.1:8781 via http://192.168.99.100:8050/execute> Traceback (most recent call last): File "/Users/kmike/svn/undercrawler/.tox/py35/lib/python3.5/site-packages/twisted/internet/defer.py", line 1128, in _inlineCallbacks result = g.send(result) File "/Users/kmike/svn/undercrawler/.tox/py35/lib/python3.5/site-packages/scrapy/core/downloader/middleware.py", line 43, in process_request defer.returnValue((yield download_func(request=request,spider=spider))) File "/Users/kmike/svn/undercrawler/.tox/py35/lib/python3.5/site-packages/twisted/internet/defer.py", line 1105, in returnValue raise _DefGen_Return(val) twisted.internet.defer._DefGen_Return: <200 http://192.168.99.100:8050/execute> During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/Users/kmike/svn/undercrawler/.tox/py35/lib/python3.5/site-packages/twisted/internet/defer.py", line 1128, in _inlineCallbacks result = g.send(result) File "/Users/kmike/svn/undercrawler/.tox/py35/lib/python3.5/site-packages/scrapy/core/downloader/middleware.py", line 53, in process_response spider=spider) File "/Users/kmike/svn/undercrawler/undercrawler/middleware/autologin.py", line 129, in process_response if self.is_logout(response): File "/Users/kmike/svn/undercrawler/undercrawler/middleware/autologin.py", line 149, in is_logout auth_cookies = {c['name'] for c in self.auth_cookies if c['value']} TypeError: 'NoneType' object is not iterable ERROR: Error downloading <GET http://192.168.99.1:8781 via http://192.168.99.100:8050/execute> Traceback (most recent call last): File "/Users/kmike/svn/undercrawler/.tox/py35/lib/python3.5/site-packages/twisted/internet/defer.py", line 1128, in _inlineCallbacks result = g.send(result) File "/Users/kmike/svn/undercrawler/.tox/py35/lib/python3.5/site-packages/scrapy/core/downloader/middleware.py", line 43, in process_request defer.returnValue((yield download_func(request=request,spider=spider))) File "/Users/kmike/svn/undercrawler/.tox/py35/lib/python3.5/site-packages/twisted/internet/defer.py", line 1105, in returnValue raise _DefGen_Return(val) twisted.internet.defer._DefGen_Return: <200 http://192.168.99.100:8050/execute> During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/Users/kmike/svn/undercrawler/.tox/py35/lib/python3.5/site-packages/twisted/internet/defer.py", line 1128, in _inlineCallbacks result = g.send(result) File "/Users/kmike/svn/undercrawler/.tox/py35/lib/python3.5/site-packages/scrapy/core/downloader/middleware.py", line 53, in process_response spider=spider) File "/Users/kmike/svn/undercrawler/undercrawler/middleware/autologin.py", line 129, in process_response if self.is_logout(response): File "/Users/kmike/svn/undercrawler/undercrawler/middleware/autologin.py", line 149, in is_logout auth_cookies = {c['name'] for c in self.auth_cookies if c['value']} TypeError: 'NoneType' object is not iterable INFO:scrapy.core.engine:Closing spider (finished) INFO: Closing spider (finished) INFO:scrapy.statscollectors:Dumping Scrapy stats: {'downloader/exception_count': 1, 'downloader/exception_type_count/requests.exceptions.ConnectionError': 1, 'downloader/request_bytes': 27995, 'downloader/request_count': 1, 'downloader/request_method_count/POST': 1, 'downloader/response_bytes': 1960, 'downloader/response_count': 1, 'downloader/response_status_count/200': 1, 'finish_reason': 'finished', 'finish_time': datetime.datetime(2016, 4, 11, 19, 32, 28, 109595), 'log_count/DEBUG': 3, 'log_count/ERROR': 1, 'log_count/INFO': 8, 'scheduler/dequeued': 3, 'scheduler/dequeued/memory': 3, 'scheduler/enqueued': 3, 'scheduler/enqueued/memory': 3, 'splash/execute/request_count': 1, 'splash/execute/response_count/200': 1, 'start_time': datetime.datetime(2016, 4, 11, 19, 32, 21, 698408)} INFO: Dumping Scrapy stats: {'downloader/exception_count': 1, 'downloader/exception_type_count/requests.exceptions.ConnectionError': 1, 'downloader/request_bytes': 27995, 'downloader/request_count': 1, 'downloader/request_method_count/POST': 1, 'downloader/response_bytes': 1960, 'downloader/response_count': 1, 'downloader/response_status_count/200': 1, 'finish_reason': 'finished', 'finish_time': datetime.datetime(2016, 4, 11, 19, 32, 28, 109595), 'log_count/DEBUG': 3, 'log_count/ERROR': 1, 'log_count/INFO': 8, 'scheduler/dequeued': 3, 'scheduler/dequeued/memory': 3, 'scheduler/enqueued': 3, 'scheduler/enqueued/memory': 3, 'splash/execute/request_count': 1, 'splash/execute/response_count/200': 1, 'start_time': datetime.datetime(2016, 4, 11, 19, 32, 21, 698408)} INFO:scrapy.core.engine:Spider closed (finished) INFO: Spider closed (finished)
Also, all logging messages are duplicated for some reason.
Fixed by https://github.com/TeamHG-Memex/undercrawler/commit/472a4f1eccb973edd1a65e2f33b07a49e895029e.
Ah, I misread the issue, sorry! At first I thought it was about the assertion in tests.
But you fixed it anyways :)
I havent' started autologin servers and got these exceptions:
Also, all logging messages are duplicated for some reason.