gnemoug / distribute_crawler

使用scrapy,redis, mongodb,graphite实现的一个分布式网络爬虫,底层存储mongodb集群,分布式使用redis实现,爬虫状态显示使用graphite实现
3.25k stars 1.6k forks source link

ImportError: Error loading object 'woaidu_crawler.pipelines.bookfile.WoaiduBookFile': cannot import name MediaPipeline #8

Open lifehack opened 10 years ago

lifehack commented 10 years ago

我在Ubuntu 12.04中按照说明配置了单mongodb的环境,但是在运行时报错如下:

/woaidu_crawler/woaidu_crawler/spiders/woaidu_detail_spider.py:12: ScrapyDeprecationWarning: woaidu_crawler.spiders.woaidu_detail_spider.WoaiduSpider inherits from deprecated class scrapy.spider.BaseSpider, please inherit from scrapy.spider.Spider. (warning only on first subclass, there may be others) class WoaiduSpider(BaseSpider): /usr/local/lib/python2.7/dist-packages/scrapy/contrib/pipeline/init.py:21: ScrapyDeprecationWarning: ITEM_PIPELINES defined as a list or a set is deprecated, switch to a dict category=ScrapyDeprecationWarning, stacklevel=1) Traceback (most recent call last): File "/usr/local/bin/scrapy", line 4, in execute() File "/usr/local/lib/python2.7/dist-packages/scrapy/cmdline.py", line 143, in execute _run_print_help(parser, _run_command, cmd, args, opts) File "/usr/local/lib/python2.7/dist-packages/scrapy/cmdline.py", line 89, in _run_print_help func(_a, *_kw) File "/usr/local/lib/python2.7/dist-packages/scrapy/cmdline.py", line 150, in _run_command cmd.run(args, opts) File "/usr/local/lib/python2.7/dist-packages/scrapy/commands/crawl.py", line 50, in run self.crawler_process.start() File "/usr/local/lib/python2.7/dist-packages/scrapy/crawler.py", line 92, in start if self.start_crawling(): File "/usr/local/lib/python2.7/dist-packages/scrapy/crawler.py", line 124, in start_crawling return self._start_crawler() is not None File "/usr/local/lib/python2.7/dist-packages/scrapy/crawler.py", line 139, in _start_crawler crawler.configure() File "/usr/local/lib/python2.7/dist-packages/scrapy/crawler.py", line 47, in configure self.engine = ExecutionEngine(self, self._spider_closed) File "/usr/local/lib/python2.7/dist-packages/scrapy/core/engine.py", line 64, in init self.scraper = Scraper(crawler) File "/usr/local/lib/python2.7/dist-packages/scrapy/core/scraper.py", line 66, in init self.itemproc = itemproc_cls.from_crawler(crawler) File "/usr/local/lib/python2.7/dist-packages/scrapy/middleware.py", line 50, in from_crawler return cls.from_settings(crawler.settings, crawler) File "/usr/local/lib/python2.7/dist-packages/scrapy/middleware.py", line 29, in from_settings mwcls = load_object(clspath) File "/usr/local/lib/python2.7/dist-packages/scrapy/utils/misc.py", line 42, in load_object raise ImportError("Error loading object '%s': %s" % (path, e)) ImportError: Error loading object 'woaidu_crawler.pipelines.bookfile.WoaiduBookFile': cannot import name MediaPipeline

scrapy版本为0.22.2,由于不熟悉python以及scrapy这一系列工具,想请教一下如何修改?

补充:去掉 'woaidu_crawler.pipelines.bookfile.WoaiduBookFile', 'woaidu_crawler.pipelines.drop_none_download.DropNoneBookFile', 后可以正常抓取。

gnemoug commented 10 years ago

看了下是scrapy版本问题,你看:https://github.com/scrapy/scrapy/blob/master/scrapy/contrib/pipeline/media.py,它把原来在images中的class移到了media.py,你改下import路径试试

imlusion commented 9 years ago

求问,到底怎么解决呢?那个链接打不开

TylerzhangZC commented 9 years ago

把distribute_crawler/woaidu_crawler/woaidu_crawler/pipelines/file.py的 line 18:from scrapy.contrib.pipeline.images import MediaPipeline 改成:from scrapy.contrib.pipeline.media import MediaPipeline 然后重新运行就可以了