Closed gccdChen closed 7 years ago
你好,你可以先检查下 runspider.py 中需要执行抓取的爬虫。 items = scrapydo.run_spider(XiCiDaiLiSpider) items = scrapydo.run_spider(SixSixIpSpider) items = scrapydo.run_spider(IpOneEightOneSpider) items = scrapydo.run_spider(KuaiDaiLiSpider) items = scrapydo.run_spider(GatherproxySpider)
如果有的话,可以查看日志 log/proxy.log 看下输出。 最终显示 **run spider waiting...* 不动的原因是在等待下次抓取,调用了 time.sleep()
如果有问题欢迎回复。
祝愉快
2017-02-14 11:31 GMT+08:00 chen notifications@github.com:
2017-02-14 11:29:18 [10], msg:sql helper execute command:CREATE TABLE IF NOT EXI STS free_ipproxy (id INT(8) NOT NULL AUTO_INCREMENT,ip CHAR(25) NOT NULL UNI QUE,port INT(4) NOT NULL,country TEXT DEFAULT NULL,anonymity INT(2) DEFAUL T NULL,https CHAR(4) DEFAULT NULL ,speed FLOAT DEFAULT NULL,source CHAR(20 ) DEFAULT NULL,save_time TIMESTAMP NOT NULL,PRIMARY KEY(id)) ENGINE=InnoDB 2017-02-14 11:29:19 [10], msg:**run spider waiting...*
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/awolfly9/IPProxyTool/issues/1, or mute the thread https://github.com/notifications/unsubscribe-auth/ALPxzTwLkRix_HL5SMK2-NyVgGAJWK7Jks5rcSAMgaJpZM4MABJ3 .
奥..5分钟更新一次.. 不过5个站点好少ip , 才155个.通过 douban 验证的才2个..
谢谢~
目前只抓取了几个站点,后许会增加。通过验证的 ip 数量会随着时间的增加而增加。有用的 ip 会不断的保留。
2017-02-14 11:29:18 [10], msg:sql helper execute command:CREATE TABLE IF NOT EXI STS free_ipproxy (
id
INT(8) NOT NULL AUTO_INCREMENT,ip
CHAR(25) NOT NULL UNI QUE,port
INT(4) NOT NULL,country
TEXT DEFAULT NULL,anonymity
INT(2) DEFAUL T NULL,https
CHAR(4) DEFAULT NULL ,speed
FLOAT DEFAULT NULL,source
CHAR(20 ) DEFAULT NULL,save_time
TIMESTAMP NOT NULL,PRIMARY KEY(id)) ENGINE=InnoDB 2017-02-14 11:29:19 [10], msg:***run spider waiting...**