Closed jmrichardson closed 7 years ago
Is it possible you have ignoreResponseErrors
set to true
?
I don't have it configured in my XML as it defaults to false. I just re-ran the job and set the field limit in ES back down to 1000. It did log the error as above but did not stop (it continued). Here is my xml snip:
<committer class="com.norconex.committer.elasticsearch.ElasticsearchCommitter">
<nodes>http://localhost:9200</nodes>
<indexName>wmsearch</indexName>
<queueDir>c:\commit</queueDir>
<jsonFieldsPattern>scope</jsonFieldsPattern>
<typeName>doc</typeName>
<commitBatchSize>100</commitBatchSize>
</committer>
It was made like this by design in case errors are only affecting a few documents and you still want the rest to be processed. I agree in some cases it may be preferable to stop. The latest snapshot releases of the HTTP and Filesystem Collectors now support specifying exceptions you want to cause the crawler to stop. In your case, it would go like this (put in your <crawler ...>
section):
<stopOnExceptions>
<exception>com.norconex.committer.core.CommitterException</exception>
</stopOnExceptions>
Please confirm if you can.
Confirm that the latest snapshot and update of the xml works. Thank you
ES committer continues to run (doesn't exit on error) but logs show the failure: Here is a snippet:
I updated the field limits to 2000:
"index.mapping.total_fields.limit": 2000
This resolved the issue but suggest the committer exit on failure or have similar for committer (to that effect).