awolfly9 / IPProxyTool

python ip proxy tool scrapy crawl. 抓取大量免费代理 ip,提取有效 ip 使用
MIT License
1.98k stars 411 forks source link

执行了 runSpider.py 过一段时间就不动了.. #1

Closed gccdChen closed 7 years ago

gccdChen commented 7 years ago

2017-02-14 11:29:18 [10], msg:sql helper execute command:CREATE TABLE IF NOT EXI STS free_ipproxy (id INT(8) NOT NULL AUTO_INCREMENT,ip CHAR(25) NOT NULL UNI QUE,port INT(4) NOT NULL,country TEXT DEFAULT NULL,anonymity INT(2) DEFAUL T NULL,https CHAR(4) DEFAULT NULL ,speed FLOAT DEFAULT NULL,source CHAR(20 ) DEFAULT NULL,save_time TIMESTAMP NOT NULL,PRIMARY KEY(id)) ENGINE=InnoDB 2017-02-14 11:29:19 [10], msg:***run spider waiting...**


awolfly9 commented 7 years ago

你好,你可以先检查下 runspider.py 中需要执行抓取的爬虫。 items = scrapydo.run_spider(XiCiDaiLiSpider) items = scrapydo.run_spider(SixSixIpSpider) items = scrapydo.run_spider(IpOneEightOneSpider) items = scrapydo.run_spider(KuaiDaiLiSpider) items = scrapydo.run_spider(GatherproxySpider)

如果有的话,可以查看日志 log/proxy.log 看下输出。 最终显示 **run spider waiting...* 不动的原因是在等待下次抓取,调用了 time.sleep()

如果有问题欢迎回复。


祝愉快

2017-02-14 11:31 GMT+08:00 chen notifications@github.com:

2017-02-14 11:29:18 [10], msg:sql helper execute command:CREATE TABLE IF NOT EXI STS free_ipproxy (id INT(8) NOT NULL AUTO_INCREMENT,ip CHAR(25) NOT NULL UNI QUE,port INT(4) NOT NULL,country TEXT DEFAULT NULL,anonymity INT(2) DEFAUL T NULL,https CHAR(4) DEFAULT NULL ,speed FLOAT DEFAULT NULL,source CHAR(20 ) DEFAULT NULL,save_time TIMESTAMP NOT NULL,PRIMARY KEY(id)) ENGINE=InnoDB 2017-02-14 11:29:19 [10], msg:**run spider waiting...*

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/awolfly9/IPProxyTool/issues/1, or mute the thread https://github.com/notifications/unsubscribe-auth/ALPxzTwLkRix_HL5SMK2-NyVgGAJWK7Jks5rcSAMgaJpZM4MABJ3 .

gccdChen commented 7 years ago

奥..5分钟更新一次.. 不过5个站点好少ip , 才155个.通过 douban 验证的才2个..

gccdChen commented 7 years ago

谢谢~

awolfly9 commented 7 years ago

目前只抓取了几个站点,后许会增加。通过验证的 ip 数量会随着时间的增加而增加。有用的 ip 会不断的保留。