CrawlScript / WebCollector

WebCollector is an open source web crawler framework based on Java.It provides some simple interfaces for crawling the Web,you can setup a multi-threaded web crawler in less than 5 minutes.
https://github.com/CrawlScript/WebCollector
GNU General Public License v3.0
3.07k stars 1.45k forks source link

运行爬取CSDN示例代码时,出现RocksDBException,Failed to create a directory: C:\code\weibocrawler\crawl\crawldb: ϵͳÕҲ»µ½ָ¶ #117

Open jack13163 opened 4 years ago

jack13163 commented 4 years ago

Exception in thread "main" org.rocksdb.RocksDBException: Failed to create a directory: C:\code\weibocrawler\crawl\crawldb: ϵͳÕҲ»µ½ָ¶ at org.rocksdb.RocksDB.open(Native Method) at org.rocksdb.RocksDB.open(RocksDB.java:231) at cn.edu.hfut.dmic.webcollector.plugin.rocks.RocksDBUtils.open(RocksDBUtils.java:94) at cn.edu.hfut.dmic.webcollector.plugin.rocks.RocksDBUtils.openCrawldbDatabase(RocksDBUtils.java:60) at cn.edu.hfut.dmic.webcollector.plugin.rocks.RocksDBManager.inject(RocksDBManager.java:87) at cn.edu.hfut.dmic.webcollector.crawldb.DBManager.inject(DBManager.java:66) at cn.edu.hfut.dmic.webcollector.crawler.Crawler.inject(Crawler.java:73) at cn.edu.hfut.dmic.webcollector.crawler.Crawler.start(Crawler.java:114) at cn.edu.hfut.dmic.webcollector.crawler.AutoParseCrawler.start(AutoParseCrawler.java:62) at cn.edu.hfut.dmic.webcollector.example.TutorialCrawler.main(TutorialCrawler.java:90)

依赖已加入:


        <dependency>
            <groupId>org.rocksdb</groupId>
            <artifactId>rocksdbjni</artifactId>
            <version>5.17.2</version>
        </dependency>```
hujunxianligong commented 4 years ago

先用 BreadthCrawler吧

jack04072 notifications@github.com 于2019年11月29日周五 上午9:57写道:

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/CrawlScript/WebCollector/issues/117?email_source=notifications&email_token=AAZZQYPJH4UKWZ5GOWXVF4DQWBZINA5CNFSM4JSZ7KDKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4H4ZAFDQ, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZZQYJQ3YJF3NSSQBV7VSDQWBZINANCNFSM4JSZ7KDA .

jack13163 commented 4 years ago

感谢您的回复,找到问题了,是导入包时出现了问题:

之前使用的是rocksdb,存在上述问题,导入的包如下: import cn.edu.hfut.dmic.webcollector.plugin.rocks.BreadthCrawler; 查看了之前的正确实例,发现使用berkeley就没有问题了 import cn.edu.hfut.dmic.webcollector.plugin.berkeley.BreadthCrawler;

Edward1428 commented 3 years ago

我也是报这个错,对于DemoSeleniumCrawler怎么使用BreadthCrawler呢,求助,需要取js渲染后的数据