多个爬虫同时爬取

CrawlScript / WebCollector

WebCollector is an open source web crawler framework based on Java.It provides some simple interfaces for crawling the Web,you can setup a multi-threaded web crawler in less than 5 minutes.

https://github.com/CrawlScript/WebCollector

GNU General Public License v3.0

3.07k stars 1.45k forks source link

多个爬虫同时爬取 #74

Closed ljc930611 closed 6 years ago

ljc930611 commented 7 years ago

我现在有两个爬虫，其中一个由于网站限制，在里面使用了Thread.sleep(),导致这个爬虫没有运行结束，另一个爬虫则不会运行。。。怎么才能使另一个爬虫开始启动？

ljc930611 commented 7 years ago

为什么 crawler.start(2);这个不是异步执行的，难道一定要这个爬虫爬取结束另外的爬虫才能启动吗

hujunxianligong commented 7 years ago

你不能自己写两个线程分别运行两个爬虫么。。。。

ljc930611 notifications@github.com 于 2017年11月2日周四上午11:53写道：

为什么 crawler.start(2);这个不是异步执行的，难道一定要这个爬虫爬取结束另外的爬虫才能启动吗

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/CrawlScript/WebCollector/issues/74#issuecomment-341310936, or mute the thread https://github.com/notifications/unsubscribe-auth/ADOYYbyNMw2o3y3Suq3aX2a45kNI8qMPks5syTzOgaJpZM4QPG2d .

ljc930611 commented 7 years ago

我两个代码不在一个线程里面，一个定时器爬取新闻是每分钟爬取一次，另一个定时器是每天固定几个点去另一个网站同步数据，但是同步数据所需的时间比较长，导致了在我同步数据的时候没有去爬取新闻。现在我的处理情况是自己去做线程池和网络请求把同步数据的那部分不使用webCollector，不知道怎么使用webCollector解决这样的问题

yxssfxwzy commented 6 years ago

写两个程序可以吗？两个程序同时执行

zhoubinghong159261 commented 6 years ago

有没有爬过国际在线新闻，你如果爬会了国际在线新闻你就懂了！！！