LiuXingMing / SinaSpider

新浪微博爬虫(Scrapy、Redis)
3.26k stars 1.52k forks source link

mongodb只生成了Fans和Follows两个表,然后爬数据一直显示302,没有爬到数据。登录又显示成功,cookie获取成功,哪位高手解答下,万分感谢! #65

Open pythonmanGo opened 6 years ago

pythonmanGo commented 6 years ago

mongodb只生成了Fans和Follows两个表,然后爬数据一直显示302,没有爬到数据。登录又显示成功,cookie获取成功,哪位高手解答下,万分感谢!

登录提示: 2017-11-07 10:45:58 [Sina_spider1.cookies] WARNING: Get Cookie Success!( Account:我是马赛克 ) 2017-11-07 10:45:58 [urllib3.connectionpool] DEBUG: Starting new HTTPS connection (1): login.sina.com.cn 2017-11-07 10:45:58 [urllib3.connectionpool] DEBUG: https://login.sina.com.cn:443 "POST /sso/login.php?client=ssologin.js(v1.4.18) HTTP/1.1" 200 None 2017-11-07 10:45:58 [Sina_spider1.cookies] WARNING: Get Cookie Success!( Account:我是马赛克 )

爬内容时提示: 2017-11-07 10:46:00 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://weibo.cn/5235640836/follow> from <GET http://weibo.cn/5235640836/follow> 2017-11-07 10:46:40 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://weibo.cn/5235640836/fans> from <GET http://weibo.cn/5235640836/fans> 2017-11-07 10:46:59 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)

zhanghanbin3159 commented 6 years ago

需要将spider中所有http改成https即可

pythonmanGo commented 6 years ago

1、已将spider中所有http改成https即可; 2、修改getcookie函数 browser = webdriver.PhantomJS(executable_path=r"D:\java\Python27\我是路径马赛克\phantomjs.exe") 3、获取cookie正常: Get Cookies Finish!( Num:1) 4、系统环境win7 64位 4G内存

但是获取cookie后弹出系统错误:python.exe 已停止运行 错误如下;

问题签名: 问题事件名称: BEX 应用程序名: python.exe 应用程序版本: 0.0.0.0 应用程序时间戳: 4c303241 故障模块名称: MSVCR90.dll 故障模块版本: 9.0.30729.6161 故障模块时间戳: 4dace5b9 异常偏移: 00066d03 异常代码: c0000417 异常数据: 00000000 OS 版本: 6.1.7601.2.1.0.256.1 区域设置 ID: 2052 其他信息 1: abf7 其他信息 2: abf7f34af3b04ddccc0d33fe401c1c02 其他信息 3: 79a5 其他信息 4: 79a5afb460eb4649151b9562e857bf2f

程序并没有报错,如何处理求指教

zhanghanbin3159 commented 6 years ago

我不是使用的这个 browser = webdriver.PhantomJS 用的火狐的driver