TylerJackk / LagouSpider

拉勾网Scrapy爬虫
5 stars 1 forks source link

建议...... #1

Open Homeless-Xu opened 7 years ago

Homeless-Xu commented 7 years ago

那个readme 建议写详细点...

依赖包

安装requirements.txt依赖
  1. pip install requirements // 先安装 requirements :
  2. pip install -r requirements.txt // 自动安装 requirements 文件面所有的依赖.

配置文件:

首先打好 mysql 环境. 建立账户密码. 登录数据库建立 Spider 数据库 然后用 creat_table.sql 里面的命令 插件数据表.

最后配置文件 填入自己数据库的 IP 数据库名字, 数据库账户 数据库密码 数据库端口 MYSQL_HOST = '127.0.0.1' MYSQL_DBNAME = 'Spider' MYSQL_USER = 'root' MYSQL_PASSWD = 'root' MYSQL_PORT = 3306

最后运行 scrapy crawl lagou

❤️我问下. 问什么 我运行看着蛮正常的 数据库就是没数据呢?????❤️ 数据表也是按照你的命令建立的.... 退出spider 我是直接按两次 ctrl C 来退出的..

✘✘∙𝒗 Spider scrapy crawl lagou 2017-03-24 16:41:27 [scrapy.utils.log] INFO: Scrapy 1.3.2 started (bot: LagouSpider) 2017-03-24 16:41:27 [scrapy.utils.log] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'LagouSpider.spiders', 'ROBOTSTXT_OBEY': True, 'SPIDER_MODULES': ['LagouSpider.spiders'], 'BOT_NAME': 'LagouSpider', 'AUTOTHROTTLE_ENABLED': True, 'D OWNLOAD_DELAY': 3} 2017-03-24 16:41:27 [scrapy.middleware] INFO: Enabled extensions: ['scrapy.extensions.logstats.LogStats', 'scrapy.extensions.telnet.TelnetConsole', 'scrapy.extensions.corestats.CoreStats', 'scrapy.extensions.throttle.AutoThrottle'] 2017-03-24 16:41:27 [scrapy.middleware] INFO: Enabled downloader middlewares: ['scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware', 'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', 'scrapy.downloadermiddlewares.retry.RetryMiddleware', 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', 'scrapy.downloadermiddlewares.stats.DownloaderStats'] 2017-03-24 16:41:27 [scrapy.middleware] INFO: Enabled spider middlewares: ['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', 'scrapy.spidermiddlewares.referer.RefererMiddleware', 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', 'scrapy.spidermiddlewares.depth.DepthMiddleware'] 2017-03-24 16:41:27 [scrapy.middleware] INFO: Enabled item pipelines: ['LagouSpider.pipelines.LagouspiderPipeline'] 2017-03-24 16:41:27 [scrapy.core.engine] INFO: Spider opened 2017-03-24 16:41:27 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2017-03-24 16:41:27 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023 2017-03-24 16:41:27 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET http://www.lagou.com/robots.txt> from <GET https://www.lagou.com/robots.txt> 2017-03-24 16:41:31 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://www.lagou.com/robots.txt> (referer: None) 2017-03-24 16:41:35 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.lagou.com/zhaopin/Java/1/> (referer: None) 2017-03-24 16:41:39 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.lagou.com/zhaopin/Java/2/> (referer: None) ^C2017-03-24 16:41:40 [scrapy.crawler] INFO: Received SIGINT, shutting down gracefully. Send again to force 2017-03-24 16:41:40 [scrapy.core.engine] INFO: Closing spider (shutdown) ^C2017-03-24 16:41:41 [scrapy.crawler] INFO: Received SIGINT twice, forcing unclean shutdown ✘✘∙𝒗 Spider

TylerJackk commented 7 years ago

谢谢建议,reademe我尽快修改一下 看你的log,是你退出的太快了 2017-03-24 16:41:39 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.lagou.com/zhaopin/Java/2/> (referer: None)

这句只是遍历了这个URL里的职位连接,具体页面还没有爬取,多等一下就会有数据了

ant9469 commented 6 years ago

大佬,我运行的时候,提示'Spider not found: lagou',为啥找不到爬虫哇?