CrawlScript / WebCollector

WebCollector is an open source web crawler framework based on Java.It provides some simple interfaces for crawling the Web,you can setup a multi-threaded web crawler in less than 5 minutes.
https://github.com/CrawlScript/WebCollector
GNU General Public License v3.0
3.07k stars 1.45k forks source link

深度爬取,存储berkeleydb错误,爬取完成不释放内存 #43

Closed 123yxp123 closed 6 years ago

123yxp123 commented 8 years ago

使用样例: \WebCollector-master\WebCollector\src\main\java\cn\edu\hfut\dmic\webcollector\example\DemoDepthCrawler.java 存储错误,爬取完成不是否内存,导致内存溢出。导致系统cup 内存使用率高,信息如下: 云盘:http://pan.baidu.com/s/1nvpk9Vb 密码:mnqz 请求解答。

Rayn-liuwei commented 6 years ago

我也遇到这个问题了。。。。使用的 cn.edu.hfut.dmic.webcollector.plugin.berkeley.BerkeleyDBManager 。。楼主最后解决了吗??

hujunxianligong commented 6 years ago

现在上了RocksDB内核了