Closed jmrichardson closed 7 years ago
I have reproduced the error by first committing all of my documents using the filesystem committer. Then I ran the ES committer on the queue directory (created from the FS committer). It gives the same error as above but I don't know how to trace where the problem is. I made sure I have the latest of everything. Here is the log of error:
INFO [AbstractCollectorConfig] Configuration loaded: id=Text Files; logsDir=c:\Elastic\ingest\norconex\workdir\logs; progressDir=c:\Elastic\ingest\norconex\workdir\progress
INFO [JobSuite] JEF work directory is: c:\Elastic\ingest\norconex\workdir\progress
INFO [JobSuite] JEF log manager is : FileLogManager
INFO [JobSuite] JEF job status store is : FileJobStatusStore
INFO [AbstractCollector] Suite of 1 crawler jobs created.
INFO [JobSuite] Initialization...
INFO [JobSuite] Previous execution detected.
INFO [JobSuite] Backing up previous execution status and log files.
INFO [JobSuite] Starting execution.
INFO [AbstractCollector] Version: Norconex Filesystem Collector 2.7.2-SNAPSHOT (Norconex Inc.)
INFO [AbstractCollector] Version: Norconex Collector Core 1.9.0-SNAPSHOT (Norconex Inc.)
INFO [AbstractCollector] Version: Norconex Importer 2.8.0-SNAPSHOT (Norconex Inc.)
INFO [AbstractCollector] Version: Norconex JEF 4.1.0 (Norconex Inc.)
INFO [AbstractCollector] Version: Norconex Committer Core 2.1.2-SNAPSHOT (Norconex Inc.)
INFO [AbstractCollector] Version: Norconex Committer Elasticsearch 4.0.0 (Norconex Inc.)
INFO [JobSuite] Running WM Search Commit: BEGIN (Fri Sep 22 18:36:35 EDT 2017)
INFO [FilesystemCrawler] 0 start paths identified.
INFO [CrawlerEventManager] CRAWLER_STARTED
INFO [AbstractCrawler] WM Search Commit: Crawling references...
INFO [AbstractCrawler] WM Search Commit: Reprocessing any cached/orphan references...
INFO [AbstractCrawler] WM Search Commit: Crawler finishing: committing documents.
INFO [AbstractFileQueueCommitter] Committing 11224 files
INFO [ElasticsearchCommitter] Sending 100 commit operations to Elasticsearch.
INFO [AbstractCrawler] WM Search Commit: Crawler executed in 7 seconds.
FATAL [JobSuite] Fatal error occured in job: WM Search Commit
INFO [JobSuite] Running WM Search Commit: END (Fri Sep 22 18:36:35 EDT 2017)
FATAL [JobSuite] Job suite execution failed: WM Search Commit
java.lang.NoSuchMethodError: org.json.JSONArray.iterator()Ljava/util/Iterator;
at com.norconex.committer.elasticsearch.ElasticsearchCommitter.extractResponseErrors(ElasticsearchCommitter.java:493)
at com.norconex.committer.elasticsearch.ElasticsearchCommitter.handleResponse(ElasticsearchCommitter.java:469)
at com.norconex.committer.elasticsearch.ElasticsearchCommitter.commitBatch(ElasticsearchCommitter.java:442)
at com.norconex.committer.core.AbstractBatchCommitter.commitAndCleanBatch(AbstractBatchCommitter.java:179)
at com.norconex.committer.core.AbstractBatchCommitter.cacheOperationAndCommitIfReady(AbstractBatchCommitter.java:208)
at com.norconex.committer.core.AbstractBatchCommitter.commitAddition(AbstractBatchCommitter.java:143)
at com.norconex.committer.core.AbstractFileQueueCommitter.commit(AbstractFileQueueCommitter.java:222)
at com.norconex.committer.elasticsearch.ElasticsearchCommitter.commit(ElasticsearchCommitter.java:387)
at com.norconex.collector.core.crawler.AbstractCrawler.execute(AbstractCrawler.java:273)
at com.norconex.collector.core.crawler.AbstractCrawler.doExecute(AbstractCrawler.java:227)
at com.norconex.collector.core.crawler.AbstractCrawler.startExecution(AbstractCrawler.java:183)
at com.norconex.jef4.job.AbstractResumableJob.execute(AbstractResumableJob.java:49)
at com.norconex.jef4.suite.JobSuite.runJob(JobSuite.java:355)
at com.norconex.jef4.suite.JobSuite.doExecute(JobSuite.java:296)
at com.norconex.jef4.suite.JobSuite.execute(JobSuite.java:168)
at com.norconex.collector.core.AbstractCollector.start(AbstractCollector.java:132)
at com.norconex.collector.core.AbstractCollectorLauncher.launch(AbstractCollectorLauncher.java:95)
at com.norconex.collector.fs.FilesystemCollector.main(FilesystemCollector.java:76)
Again, the above was for running just the committer with the below command and xml:
collector-fs.bat -a start -c c:\Elastic\ingest\norconex\config\elastic.xml
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE xml>
<fscollector id="Text Files">
<logsDir>c:\Elastic\ingest\norconex\workdir\logs</logsDir>
<progressDir>c:\Elastic\ingest\norconex\workdir\progress</progressDir>
<crawlers>
<crawler id="WM Search Commit">
<committer class="com.norconex.committer.elasticsearch.ElasticsearchCommitter">
<nodes>http://localhost:9200</nodes>
<indexName>wmsearch</indexName>
<queueDir>c:\commit</queueDir>
<ignoreResponseErrors>true</ignoreResponseErrors>
<typeName>doc</typeName>
<queueSize>9999999</queueSize>
<commitBatchSize>100</commitBatchSize>
</committer>
</crawler>
</crawlers>
</fscollector>
This was caused by a conflict between two library dependencies. It was fixed as part of #16. Please confirm.
Once you installed the new committer snapshot, it is possible the faulty jar is still there within the filesystem collector. A new release of the Filesystem collector should be made soon, but in the meantime, delete this file if you still have the issue after the committer install: json-20160810.jar
.
FYI, a new snapshot of the Filesystem Collector was just released without that conflicting dependency.
Hello, when running the crawler with multiple threads, I get the following error:
When I run with 1 thread it completes successfully:
In both cases, I started clean by removing the index in ES, removing the committer-queue, and workdir files (just to be sure nothing was left over from previous runs). Here is my environment:
and my config file:
I am not sure what is causing this issue. Please advise Thanks in advance