Norconex / committer-core

Norconex Committer is a java library and command line application used to route content to local or remote target repositories, such as a search engine index.
http://www.norconex.com/collectors/committer-core
Apache License 2.0
4 stars 10 forks source link

com.norconex.committer.core.CommitterException: Could not obtain content stream for ... #14

Closed ronjakoi closed 6 years ago

ronjakoi commented 6 years ago

What is this error? I see it intermittently in my logs and can't really see any rhyme or reason to it.

intranet-sv: 2018-11-25 13:14:05 ERROR - intranet-sv: Could not process document: https://intranet.helsinki.fi/[REDACTED] (Could not obtain content stream for /opt/norconex/intranet/weekly/work/solr-queue/2017/11-25/01/03/03/1511607783363000000-add.cntnt)
com.norconex.committer.core.CommitterException: Could not obtain content stream for /opt/norconex/intranet/weekly/work/solr-queue/2017/11-25/01/03/03/1511607783363000000-add.cntnt
        at com.norconex.committer.core.FileAddOperation.getContentStream(FileAddOperation.java:129)
        at com.norconex.committer.core.AbstractMappedCommitter.prepareCommitAddition(AbstractMappedCommitter.java:291)
        at com.norconex.committer.core.AbstractFileQueueCommitter.commit(AbstractFileQueueCommitter.java:221)
        at com.norconex.committer.core.AbstractCommitter.commitIfReady(AbstractCommitter.java:146)
        at com.norconex.committer.core.AbstractCommitter.add(AbstractCommitter.java:97)
        at com.norconex.collector.core.pipeline.committer.CommitModuleStage.execute(CommitModuleStage.java:34)
        at com.norconex.collector.core.pipeline.committer.CommitModuleStage.execute(CommitModuleStage.java:27)
        at com.norconex.commons.lang.pipeline.Pipeline.execute(Pipeline.java:91)
        at com.norconex.collector.http.crawler.HttpCrawler.executeCommitterPipeline(HttpCrawler.java:377)
        at com.norconex.collector.core.crawler.AbstractCrawler.processImportResponse(AbstractCrawler.java:567)
        at com.norconex.collector.core.crawler.AbstractCrawler.processNextQueuedCrawlData(AbstractCrawler.java:524)
        at com.norconex.collector.core.crawler.AbstractCrawler.processNextReference(AbstractCrawler.java:407)
        at com.norconex.collector.core.crawler.AbstractCrawler$ProcessReferencesRunnable.run(AbstractCrawler.java:789)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.FileNotFoundException: /opt/norconex/intranet/weekly/work/solr-queue/2017/11-25/01/03/03/1511607783363000000-add.cntnt (No such file or directory)
        at java.io.FileInputStream.open0(Native Method)
        at java.io.FileInputStream.open(FileInputStream.java:195)
        at java.io.FileInputStream.<init>(FileInputStream.java:138)
        at com.norconex.committer.core.FileAddOperation.getContentStream(FileAddOperation.java:127)
        ... 15 more

I am still on HTTP Collector 2.7.1 (Committer Core 2.1.1, Committer Solr 2.3.0).

essiembre commented 6 years ago

What does your config look like? By any chance, do you have multiple Committer defined pointing to the same solr-queue location? That's one reason I can think of. Some files may have been sent already by another committer.

ronjakoi commented 6 years ago

Ah, it looks like I do. I have one queue dir per Collector, but each Collector has several Crawlers, and all of those use the same queue dir. I will give each one their own queue and see if that fixes the error.

Good to hear that the error isn't anything that's actually broken, per se :)

essiembre commented 6 years ago

In fact, if all your Committers are submitting documents to the same collection, the error should not impact what ends up in Solr and you should get everything. It is still a good idea to get rid of it though! :-)

ronjakoi commented 6 years ago

This is indeed the case. I gave them each their own solr-queue and that seems to have fixed it. Closing.