Norconex / crawlers

Norconex Crawlers (or spiders) are flexible web and filesystem crawlers for collecting, parsing, and manipulating data from the web or filesystem to various data repositories such as search engines.
https://opensource.norconex.com/crawlers
Apache License 2.0
183 stars 68 forks source link

Error when trying to crawl same job again #264

Closed doaa-khaled closed 8 years ago

doaa-khaled commented 8 years ago

I tried to crawl with the same configuration for second time, when trying to do that I get that exception Execution failed for job: Crawler HTTP Collector com.norconex.jef4.job.JobException: 2 out of 2 jobs failed in async group "Crawler HTTP Collector" at com.norconex.jef4.job.group.AsyncJobGroup.executeGroup(AsyncJobGroup.java:105) at com.norconex.jef4.job.group.AbstractJobGroup.execute(AbstractJobGroup.java:80) at com.norconex.jef4.suite.JobSuite.runJob(JobSuite.java:350) at com.norconex.jef4.suite.JobSuite.doExecute(JobSuite.java:300) at com.norconex.jef4.suite.JobSuite.execute(JobSuite.java:172) at com.norconex.collector.core.AbstractCollector.start(AbstractCollector.java:120)

essiembre commented 8 years ago

Do you have more information in the logs? The above does not say much unfortunately. Are you running it on its own or are you re-running it in the same JVM instance? You have to make sure you do not reusing the same instances. The best, as suggested before, is to launch your crawl code as an external process. If already a standalone process, feel free to attach your config to help with the troubleshooting.

doaa-khaled commented 8 years ago

this is the error in log file java.lang.IllegalStateException: The file is locked: nio:E:\norconex-collector-http\crawler-output/crawlstore/mvstore/WikiPedia//mvstore [1.4.191/7] at org.h2.mvstore.DataUtils.newIllegalStateException(DataUtils.java:773) at org.h2.mvstore.FileStore.open(FileStore.java:167) at org.h2.mvstore.MVStore.<init>(MVStore.java:342) at org.h2.mvstore.MVStore.open(MVStore.java:390) at com.norconex.collector.core.data.store.impl.mvstore.MVStoreCrawlDataStore.<init>(MVStoreCrawlDataStore.java:57) at com.norconex.collector.core.data.store.impl.mvstore.MVStoreCrawlDataStoreFactory.createCrawlDataStore(MVStoreCrawlDataStoreFactory.java:48) at com.norconex.collector.core.crawler.AbstractCrawler.doExecute(AbstractCrawler.java:188) at com.norconex.collector.core.crawler.AbstractCrawler.startExecution(AbstractCrawler.java:174) at com.norconex.jef4.job.AbstractResumableJob.execute(AbstractResumableJob.java:49) at com.norconex.jef4.suite.JobSuite.runJob(JobSuite.java:350) at com.norconex.jef4.job.group.AsyncJobGroup.runJob(AsyncJobGroup.java:119) at com.norconex.jef4.job.group.AsyncJobGroup.access$000(AsyncJobGroup.java:44) at com.norconex.jef4.job.group.AsyncJobGroup$1.run(AsyncJobGroup.java:86) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Caused by: java.nio.channels.OverlappingFileLockException at sun.nio.ch.SharedFileLockTable.checkList(Unknown Source) at sun.nio.ch.SharedFileLockTable.add(Unknown Source) at sun.nio.ch.FileChannelImpl.tryLock(Unknown Source) at org.h2.store.fs.FileNio.tryLock(FilePathNio.java:121) at org.h2.mvstore.cache.FilePathCache$FileCache.tryLock(FilePathCache.java:158) at java.nio.channels.FileChannel.tryLock(Unknown Source) at org.h2.mvstore.FileStore.open(FileStore.java:164) ... 14 more Alliance Memory: 2016-06-22 14:54:38 ERROR - Execution failed for job: Alliance Memory java.lang.IllegalStateException: The file is locked: nio:E:\norconex-collector-http\crawler-output/crawlstore/mvstore/Alliance_32_Memory//mvstore [1.4.191/7] at org.h2.mvstore.DataUtils.newIllegalStateException(DataUtils.java:773) at org.h2.mvstore.FileStore.open(FileStore.java:167) at org.h2.mvstore.MVStore.<init>(MVStore.java:342) at org.h2.mvstore.MVStore.open(MVStore.java:390) at com.norconex.collector.core.data.store.impl.mvstore.MVStoreCrawlDataStore.<init>(MVStoreCrawlDataStore.java:57) at com.norconex.collector.core.data.store.impl.mvstore.MVStoreCrawlDataStoreFactory.createCrawlDataStore(MVStoreCrawlDataStoreFactory.java:48) at com.norconex.collector.core.crawler.AbstractCrawler.doExecute(AbstractCrawler.java:188) at com.norconex.collector.core.crawler.AbstractCrawler.startExecution(AbstractCrawler.java:174) at com.norconex.jef4.job.AbstractResumableJob.execute(AbstractResumableJob.java:49) at com.norconex.jef4.suite.JobSuite.runJob(JobSuite.java:350) at com.norconex.jef4.job.group.AsyncJobGroup.runJob(AsyncJobGroup.java:119) at com.norconex.jef4.job.group.AsyncJobGroup.access$000(AsyncJobGroup.java:44) at com.norconex.jef4.job.group.AsyncJobGroup$1.run(AsyncJobGroup.java:86) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Caused by: java.nio.channels.OverlappingFileLockException at sun.nio.ch.SharedFileLockTable.checkList(Unknown Source) at sun.nio.ch.SharedFileLockTable.add(Unknown Source) at sun.nio.ch.FileChannelImpl.tryLock(Unknown Source) at org.h2.store.fs.FileNio.tryLock(FilePathNio.java:121) at org.h2.mvstore.cache.FilePathCache$FileCache.tryLock(FilePathCache.java:158) at java.nio.channels.FileChannel.tryLock(Unknown Source) at org.h2.mvstore.FileStore.open(FileStore.java:164) ... 14 more

and this is my configuration `<?xml version="1.0" encoding="UTF-8"?>

E:\norconex-collector-http\crawler-output\progressE:\norconex-collector-http\crawler-output\logs http://www.alliancememory.com/ E:\norconex-collector-http\crawler-output 15 2 -1 false E:\norconex-collector-http\crawler-output jpg,gif,png,ico,css,js 200404 title,keywords,description,document.reference E:\norconex-collector-http\crawler-output\crawledFiles https://en.wikipedia.org/ E:\norconex-collector-http\crawler-output 15 2 -1 false E:\norconex-collector-http\crawler-output jpg,gif,png,ico,css,js 200 404 title,keywords,description,document.reference E:\norconex-collector-http\crawler-output\crawledFiles `
essiembre commented 8 years ago

The file lock seems to indicate the problem is what I was asking you about. There is still a previous instance of the collector/crawler still in the JVM with a hold on that file. Have you tried lauching it as an external process instead? If you have to launch it through your web app, you may have to find ways around the lock issue. This thread may help you: #249.

doaa-khaled commented 8 years ago

yes it works as external process now.. thanks a lot