fluffybeing / newsler

A complete automated financial news crawler built on the top of Scrapy framework.
90 stars 42 forks source link

exceptions.AttributeError: 'NewsSpider' object has no attribute '_rules' #1

Open ghost opened 8 years ago

ghost commented 8 years ago

crapy crawl NewsSpider -a src_json=sources/sample.json

exceptions.AttributeError: 'NewsSpider' object has no attribute '_rules'

how to fix this ?

fluffybeing commented 8 years ago

@codepython can you share whole traceback?

ghost commented 8 years ago

@rahulrrixe My command is scrapy crawl NewsSpider -a src_json=sources/forbes.json . And the traceback is at below:

2015-11-16 12:14:52+0800 [scrapy] INFO: Scrapy 0.24.4 started (bot: scrapybot) 2015-11-16 12:14:52+0800 [scrapy] INFO: Optional features available: ssl, http11 2015-11-16 12:14:52+0800 [scrapy] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'newscrawler.spiders', 'SPIDER_MODULES': ['newscrawler.spiders'], 'LOG_LEVEL': 'INFO', 'DOWNLOAD_DELAY': 0.25} 2015-11-16 12:14:52+0800 [scrapy] INFO: Enabled extensions: LogStats, TelnetConsole, CloseSpider, WebService, CoreStats, SpiderState 2015-11-16 12:14:52+0800 [scrapy] INFO: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats 2015-11-16 12:14:52+0800 [scrapy] INFO: Enabled spider middlewares: HttpErrorMiddleware, RotateUserAgentMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware 2015-11-16 12:14:52+0800 [scrapy] INFO: Enabled item pipelines: DuplicatesPipeline, MongoDBPipeline 2015-11-16 12:14:52+0800 [NewsSpider] INFO: Spider opened 2015-11-16 12:14:52+0800 [NewsSpider] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2015-11-16 12:14:53+0800 [NewsSpider] ERROR: Spider error processing <GET http://urlsearch.commoncrawl.org/?q=forbes.com> Traceback (most recent call last): File "/Users/git/python/Financial-News-Crawler/env/lib/python2.7/site-packages/twisted/internet/base.py", line 824, in runUntilCurrent call.func(_call.args, _call.kw) File "/Users/git/python/Financial-News-Crawler/env/lib/python2.7/site-packages/twisted/internet/task.py", line 638, in _tick taskObj._oneWorkUnit() File "/Users/git/python/Financial-News-Crawler/env/lib/python2.7/site-packages/twisted/internet/task.py", line 484, in _oneWorkUnit result = next(self._iterator) File "/Users/peng/git/python/Financial-News-Crawler/env/lib/python2.7/site-packages/scrapy/utils/defer.py", line 57, in work = (callable(elem, _args, _named) for elem in iterable) --- --- File "/Users/git/python/Financial-News-Crawler/env/lib/python2.7/site-packages/scrapy/utils/defer.py", line 96, in iter_errback yield next(it) File "/Users/peng/git/python/Financial-News-Crawler/env/lib/python2.7/site-packages/scrapy/contrib/spidermiddleware/offsite.py", line 26, in process_spider_output for x in result: File "/Users//git/python/Financial-News-Crawler/env/lib/python2.7/site-packages/scrapy/contrib/spidermiddleware/referer.py", line 22, in return (_set_referer(r) for r in result or ()) File "/Users/git/python/Financial-News-Crawler/env/lib/python2.7/site-packages/scrapy/contrib/spidermiddleware/urllength.py", line 33, in return (r for r in result or () if _filter(r)) File "/Users/peng/git/python/Financial-News-Crawler/env/lib/python2.7/site-packages/scrapy/contrib/spidermiddleware/depth.py", line 50, in return (r for r in result or () if _filter(r)) File "/Users//git/python/Financial-News-Crawler/env/lib/python2.7/site-packages/scrapy/contrib/spiders/crawl.py", line 73, in _parse_response for request_or_item in self._requests_to_follow(response): File "/Users/git/python/Financial-News-Crawler/env/lib/python2.7/site-packages/scrapy/contrib/spiders/crawl.py", line 51, in _requests_to_follow for n, rule in enumerate(self._rules): exceptions.AttributeError: 'NewsSpider' object has no attribute '_rules'

2015-11-16 12:14:53+0800 [NewsSpider] INFO: Closing spider (finished) 2015-11-16 12:14:53+0800 [NewsSpider] INFO: Dumping Scrapy stats: {'downloader/request_bytes': 237, 'downloader/request_count': 1, 'downloader/request_method_count/GET': 1, 'downloader/response_bytes': 2236, 'downloader/response_count': 1, 'downloader/response_status_count/200': 1, 'finish_reason': 'finished', 'finish_time': datetime.datetime(2015, 11, 16, 4, 14, 53, 947818), 'log_count/ERROR': 1, 'log_count/INFO': 7, 'response_received_count': 1, 'scheduler/dequeued': 1, 'scheduler/dequeued/memory': 1, 'scheduler/enqueued': 1, 'scheduler/enqueued/memory': 1, 'spider_exceptions/AttributeError': 1, 'start_time': datetime.datetime(2015, 11, 16, 4, 14, 52, 381394)} 2015-11-16 12:14:53+0800 [NewsSpider] INFO: Spider closed (finished)

ghost commented 8 years ago

if I run scrapy crawl gooseSpider -a src_json=sources/forbes.json ,it reports: KeyError: 'Spider not found: gooseSpider'

I can not find where is wrong?

zhangruiskyline commented 7 years ago

I have similar problem, where these _rules are defined in scraps? or we should manage them ourselves?

vdabravolski commented 7 years ago

I had the same issue. Anyone resolved it?

fluffybeing commented 7 years ago

I need to resolve this issue. As there has been a lot of changes with Scrapy. Will take a week.

vdabravolski commented 7 years ago

The following worked for me (adding "super" call in spider init):

   def __init__(self, *a, **kw):
        super(NewsSpider, self).__init__(*a, **kw)
heranly commented 5 years ago

You can check your code, maybe you are writing less the "self" when you call the parent class.
def init(self, *a, *kw): super(NewsSpider, self).init(a, **kw)

I have the same problem, for this reason, I hope to help you.