Norconex / committer-elasticsearch

Implementation of Norconex Committer for Elasticsearch.
https://opensource.norconex.com/committers/elasticsearch/
Apache License 2.0
11 stars 6 forks source link

Limit of total fields [1000] in index has been exceeded #18

Closed jmrichardson closed 7 years ago

jmrichardson commented 7 years ago

ES committer continues to run (doesn't exit on error) but logs show the failure: Here is a snippet:

WM Search: 2017-09-24 18:23:23 INFO - Sending 100 commit operations to Elasticsearch.
WM Search: 2017-09-24 18:23:25 INFO - Elasticsearch RestClient closed.
WM Search: 2017-09-24 18:23:25 INFO - Elasticsearch RestClient closed.
WM Search: 2017-09-24 18:23:25 INFO -            REJECTED_ERROR: file:///c:/xxx.doc (com.norconex.committer.core.CommitterException: Elasticsearch returned one or more errors:
[{
    "_index": "wmsearch",
    "_type": "doc",
    "_id": "file:///c:xxxx",
    "status": 400,
    "error": {
        "type": "illegal_argument_exception",
        "reason": "Limit of total fields [1000] in index [wmsearch] has been exceeded"
    }
},
...
}]
    at com.norconex.committer.elasticsearch.ElasticsearchCommitter.handleResponse(ElasticsearchCommitter.java:514)
    at com.norconex.committer.elasticsearch.ElasticsearchCommitter.commitBatch(ElasticsearchCommitter.java:482)
    at com.norconex.committer.core.AbstractBatchCommitter.commitAndCleanBatch(AbstractBatchCommitter.java:179)
    at com.norconex.committer.core.AbstractBatchCommitter.cacheOperationAndCommitIfReady(AbstractBatchCommitter.java:208)
    at com.norconex.committer.core.AbstractBatchCommitter.commitAddition(AbstractBatchCommitter.java:143)
    at com.norconex.committer.core.AbstractFileQueueCommitter.commit(AbstractFileQueueCommitter.java:222)
    at com.norconex.committer.elasticsearch.ElasticsearchCommitter.commit(ElasticsearchCommitter.java:427)
    at com.norconex.committer.core.AbstractCommitter.commitIfReady(AbstractCommitter.java:146)
    at com.norconex.committer.core.AbstractCommitter.add(AbstractCommitter.java:97)
    at com.norconex.collector.core.pipeline.committer.CommitModuleStage.execute(CommitModuleStage.java:34)
    at com.norconex.collector.core.pipeline.committer.CommitModuleStage.execute(CommitModuleStage.java:27)
    at com.norconex.commons.lang.pipeline.Pipeline.execute(Pipeline.java:91)
    at com.norconex.collector.fs.crawler.FilesystemCrawler.executeCommitterPipeline(FilesystemCrawler.java:243)
    at com.norconex.collector.core.crawler.AbstractCrawler.processImportResponse(AbstractCrawler.java:586)
    at com.norconex.collector.core.crawler.AbstractCrawler.processNextQueuedCrawlData(AbstractCrawler.java:543)
    at com.norconex.collector.core.crawler.AbstractCrawler.processNextReference(AbstractCrawler.java:418)
    at com.norconex.collector.core.crawler.AbstractCrawler$ProcessReferencesRunnable.run(AbstractCrawler.java:803)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.lang.Thread.run(Unknown Source)
WM Search: 2017-09-24 18:23:25 INFO - DOCUMENT_METADATA_FETCHED: file:///c:...

I updated the field limits to 2000:

"index.mapping.total_fields.limit": 2000

This resolved the issue but suggest the committer exit on failure or have similar for committer (to that effect).

essiembre commented 7 years ago

Is it possible you have ignoreResponseErrors set to true?

jmrichardson commented 7 years ago

I don't have it configured in my XML as it defaults to false. I just re-ran the job and set the field limit in ES back down to 1000. It did log the error as above but did not stop (it continued). Here is my xml snip:

      <committer class="com.norconex.committer.elasticsearch.ElasticsearchCommitter">
        <nodes>http://localhost:9200</nodes>
        <indexName>wmsearch</indexName>
        <queueDir>c:\commit</queueDir>
        <jsonFieldsPattern>scope</jsonFieldsPattern>
        <typeName>doc</typeName>
        <commitBatchSize>100</commitBatchSize>
      </committer>
essiembre commented 7 years ago

It was made like this by design in case errors are only affecting a few documents and you still want the rest to be processed. I agree in some cases it may be preferable to stop. The latest snapshot releases of the HTTP and Filesystem Collectors now support specifying exceptions you want to cause the crawler to stop. In your case, it would go like this (put in your <crawler ...> section):

<stopOnExceptions>
    <exception>com.norconex.committer.core.CommitterException</exception>
</stopOnExceptions>

Please confirm if you can.

jmrichardson commented 7 years ago

Confirm that the latest snapshot and update of the xml works. Thank you