992946217 / funny

0 stars 0 forks source link

scrapy—Redis中的RFPDupeFilter使用时会发生错误 #1

Open 992946217 opened 3 years ago

992946217 commented 3 years ago

当取消使用时,程序运行不会发生错误 这是为什么 DUPEFFILTER_CLASS = "scrapy_redis.dupefilter.RFPDupeFilter

SCHEDULER = "scrapy_redis.scheduler.Scheduler"

992946217 commented 3 years ago

(base) C:\Users\qwer\PycharmProjects\爬虫\第八章 scrapy框架\fbsPro\fbsPro\spiders>scrapy runspider fbs.py

2021-04-15 12:24:04 [scrapy.utils.log] INFO: Scrapy 2.4.1 started (bot: fbsPro)

2021-04-15 12:24:04 [scrapy.utils.log] INFO: Versions: lxml 4.6.1.0, libxml2 2.9.10, cssselect 1.1.0, parsel 1.5.2, w3lib 1.21.0, Twisted 21.2.0, Python 3.8.5 (default, Sep 3 2020, 21:29

:08) [MSC v.1916 64 bit (AMD64)], pyOpenSSL 19.1.0 (OpenSSL 1.1.1h 22 Sep 2020), cryptography 3.1.1, Platform Windows-10-10.0.19041-SP0

2021-04-15 12:24:04 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.selectreactor.SelectReactor

2021-04-15 12:24:04 [scrapy.crawler] INFO: Overridden settings:

{‘BOT_NAME’: ‘fbsPro’,

‘CONCURRENT_REQUESTS’: 2,

‘NEWSPIDER_MODULE’: ‘fbsPro.spiders’,

‘SCHEDULER’: ‘scrapy_redis.scheduler.Scheduler’,

‘SPIDER_LOADER_WARN_ONLY’: True,

‘SPIDER_MODULES’: [‘fbsPro.spiders’],

‘USER_AGENT’: 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 ’

‘(KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36’}

2021-04-15 12:24:05 [scrapy.extensions.telnet] INFO: Telnet Password: 8b3128fb39ca699b

2021-04-15 12:24:05 [scrapy.middleware] INFO: Enabled extensions:

[‘scrapy.extensions.corestats.CoreStats’,

‘scrapy.extensions.telnet.TelnetConsole’,

‘scrapy.extensions.logstats.LogStats’]

2021-04-15 12:24:05 [fbs] INFO: Reading start URLs from redis key ‘sunQueue’ (batch size: 2, encoding: utf-8

2021-04-15 12:24:07 [scrapy.middleware] INFO: Enabled downloader middlewares:

[‘scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware’,

‘scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware’,

‘scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware’,

‘scrapy.downloadermiddlewares.useragent.UserAgentMiddleware’,

‘scrapy.downloadermiddlewares.retry.RetryMiddleware’,

‘scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware’,

‘scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware’,

‘scrapy.downloadermiddlewares.redirect.RedirectMiddleware’,

‘scrapy.downloadermiddlewares.cookies.CookiesMiddleware’,

‘scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware’,

‘scrapy.downloadermiddlewares.stats.DownloaderStats’]

2021-04-15 12:24:07 [scrapy.middleware] INFO: Enabled spider middlewares:

[‘scrapy.spidermiddlewares.httperror.HttpErrorMiddleware’,

‘scrapy.spidermiddlewares.offsite.OffsiteMiddleware’,

‘scrapy.spidermiddlewares.referer.RefererMiddleware’,

‘scrapy.spidermiddlewares.urllength.UrlLengthMiddleware’,

‘scrapy.spidermiddlewares.depth.DepthMiddleware’]

2021-04-15 12:24:07 [scrapy.middleware] INFO: Enabled item pipelines:

[‘scrapy_redis.pipelines.RedisPipeline’]

2021-04-15 12:24:07 [scrapy.core.engine] INFO: Spider opened

2021-04-15 12:24:07 [scrapy.core.engine] INFO: Closing spider (shutdown)

2021-04-15 12:24:07 [scrapy.core.engine] ERROR: Scraper close failure

Traceback (most recent call last):

File “E:\anaconda\lib\site-packages\scrapy\crawler.py”, line 89, in crawl

yield self.engine.open_spider(self.spider, start_requests)

AttributeError: type object ‘RFPDupeFilter’ has no attribute ‘from_spider’

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

File “E:\anaconda\lib\site-packages\twisted\internet\defer.py”, line 662, in _runCallbacks

current.result = callback(current.result, *args, **kw)

File “E:\anaconda\lib\site-packages\scrapy\core\engine.py”, line 325, in

dfd.addBoth(lambda _: self.scraper.close_spider(spider))

File “E:\anaconda\lib\site-packages\scrapy\core\scraper.py”, line 86, in close_spider

slot.closing = defer.Deferred()

AttributeError: ‘NoneType’ object has no attribute ‘closing’

2021-04-15 12:24:07 [scrapy.utils.signal] ERROR: Error caught on signal handler: <bound method CoreStats.spider_closed of <scrapy.extensions.corestats.CoreStats object at 0x000001A786B385

20>>

Traceback (most recent call last):

File “E:\anaconda\lib\site-packages\scrapy\crawler.py”, line 89, in crawl

yield self.engine.open_spider(self.spider, start_requests)

AttributeError: type object ‘RFPDupeFilter’ has no attribute ‘from_spider’

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

File “E:\anaconda\lib\site-packages\scrapy\utils\defer.py”, line 157, in maybeDeferred_coro

result = f(*args, **kw)

File “E:\anaconda\lib\site-packages\pydispatch\robustapply.py”, line 55, in robustApply

return receiver(*arguments, **named)

File “E:\anaconda\lib\site-packages\scrapy\extensions\corestats.py”, line 31, in spider_closed

elapsed_time = finish_time - self.start_time

TypeError: unsupported operand type(s) for -: ‘datetime.datetime’ and ‘NoneType’

2021-04-15 12:24:07 [scrapy.statscollectors] INFO: Dumping Scrapy stats:

{‘log_count/ERROR’: 2, ‘log_count/INFO’: 9}

2021-04-15 12:24:07 [scrapy.core.engine] INFO: Spider closed (shutdown)

Unhandled error in Deferred:

2021-04-15 12:24:07 [twisted] CRITICAL: Unhandled error in Deferred:

Traceback (most recent call last):

File “E:\anaconda\lib\site-packages\scrapy\crawler.py”, line 192, in crawl

return self._crawl(crawler, *args, **kwargs)

File “E:\anaconda\lib\site-packages\scrapy\crawler.py”, line 196, in _crawl

d = crawler.crawl(*args, **kwargs)

File “E:\anaconda\lib\site-packages\twisted\internet\defer.py”, line 1656, in unwindGenerator

return _cancellableInlineCallbacks(gen)

File “E:\anaconda\lib\site-packages\twisted\internet\defer.py”, line 1571, in _cancellableInlineCallbacks

_inlineCallbacks(None, g, status)

— —

File “E:\anaconda\lib\site-packages\twisted\internet\defer.py”, line 1445, in _inlineCallbacks

result = current_context.run(g.send, result)

File “E:\anaconda\lib\site-packages\scrapy\crawler.py”, line 89, in crawl

yield self.engine.open_spider(self.spider, start_requests)

builtins.AttributeError: type object ‘RFPDupeFilter’ has no attribute ‘from_spider’

2021-04-15 12:24:07 [twisted] CRITICAL:

Traceback (most recent call last):

File “E:\anaconda\lib\site-packages\twisted\internet\defer.py”, line 1445, in _inlineCallbacks

result = current_context.run(g.send, result)

File “E:\anaconda\lib\site-packages\scrapy\crawler.py”, line 89, in crawl

yield self.engine.open_spider(self.spider, start_requests)

AttributeError: type object ‘RFPDupeFilter’ has no attribute ‘from_spider’