CrawlScript / WebCollector

WebCollector is an open source web crawler framework based on Java.It provides some simple interfaces for crawling the Web,you can setup a multi-threaded web crawler in less than 5 minutes.
https://github.com/CrawlScript/WebCollector
GNU General Public License v3.0
3.07k stars 1.45k forks source link

爬取一段时间后总是会抛出RocksDBException异常,不清楚什么原因。 #103

Open tanwubo opened 5 years ago

tanwubo commented 5 years ago

org.rocksdb.RocksDBException: Failed to create a NewWriteableFile: E:\developer\code\idea code\crawler\csdnCrawler\link/000007.sst: ¾ܾø·ÃÎʡ

at org.rocksdb.RocksDB.open(Native Method)
at org.rocksdb.RocksDB.open(RocksDB.java:231)
at cn.edu.hfut.dmic.webcollector.plugin.rocks.RocksDBUtils.open(RocksDBUtils.java:94)
at cn.edu.hfut.dmic.webcollector.plugin.rocks.RocksDBUtils.openLinkDatabase(RocksDBUtils.java:68)
at cn.edu.hfut.dmic.webcollector.plugin.rocks.RocksDBManager.merge(RocksDBManager.java:176)
at cn.edu.hfut.dmic.webcollector.fetcher.Fetcher.fetchAll(Fetcher.java:310)
at cn.edu.hfut.dmic.webcollector.crawler.Crawler.start(Crawler.java:136)
at cn.edu.hfut.dmic.webcollector.crawler.AutoParseCrawler.start(AutoParseCrawler.java:63)
at tanwubo.CrawlerApplicationTests.csdnCrawlerTest(CrawlerApplicationTests.java:32)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.springframework.test.context.junit4.statements.RunBeforeTestExecutionCallbacks.evaluate(RunBeforeTestExecutionCallbacks.java:74)
at org.springframework.test.context.junit4.statements.RunAfterTestExecutionCallbacks.evaluate(RunAfterTestExecutionCallbacks.java:84)
at org.springframework.test.context.junit4.statements.RunBeforeTestMethodCallbacks.evaluate(RunBeforeTestMethodCallbacks.java:75)
at org.springframework.test.context.junit4.statements.RunAfterTestMethodCallbacks.evaluate(RunAfterTestMethodCallbacks.java:86)
at org.springframework.test.context.junit4.statements.SpringRepeat.evaluate(SpringRepeat.java:84)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at org.springframework.test.context.junit4.SpringJUnit4ClassRunner.runChild(SpringJUnit4ClassRunner.java:251)
at org.springframework.test.context.junit4.SpringJUnit4ClassRunner.runChild(SpringJUnit4ClassRunner.java:97)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.springframework.test.context.junit4.statements.RunBeforeTestClassCallbacks.evaluate(RunBeforeTestClassCallbacks.java:61)
at org.springframework.test.context.junit4.statements.RunAfterTestClassCallbacks.evaluate(RunAfterTestClassCallbacks.java:70)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at org.springframework.test.context.junit4.SpringJUnit4ClassRunner.run(SpringJUnit4ClassRunner.java:190)
at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
at com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
LinanYaooo commented 3 years ago

也遇到了,咋解决的

hujunxianligong commented 3 years ago

可以切换为基于伯克利DB的BreadthCrawler: https://github.com/CrawlScript/WebCollector/blob/master/src/main/java/cn/edu/hfut/dmic/webcollector/plugin/berkeley/BerkeleyCrawler.java

Linany @.***> 于2021年11月16日周二 下午1:30写道:

也遇到了,咋解决的

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/CrawlScript/WebCollector/issues/103#issuecomment-969876673, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZZQYP6ASIRC2GOMAOF5VLUMHUAXANCNFSM4GI3DXRA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.