Norconex / crawlers

Norconex Crawlers (or spiders) are flexible web and filesystem crawlers for collecting, parsing, and manipulating data from the web or filesystem to various data repositories such as search engines.
https://opensource.norconex.com/crawlers
Apache License 2.0
183 stars 68 forks source link

Solr not deleting documents because maxwarmSearchers=2 #125

Closed OkkeKlein closed 9 years ago

OkkeKlein commented 9 years ago

While testing, the collector issued a lot of deletion commands to Solr in short period of time, so Solr didn't have enough time to warm a new seacher.

website: 2015-07-09 15:37:17 ERROR - website: Could not process document: http://www.XXX (Cannot index document batch to Solr.) com.norconex.committer.core.CommitterException: Cannot index document batch to Solr.

Caused by: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://XXX: Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later.

Under normal circumstances this would not happen, so this issue is just to inform.

essiembre commented 9 years ago

Have to tried configuring your Solr Committer for retry?

<committer class="com.norconex.committer.solr.SolrCommitter">
  ... 
  <maxRetries>10</maxRetries>
  <maxRetryWait>60000</maxRetryWait>
</committer>

The above will try up to 10 times upon failure, waiting 1 minute between each attempts. This should give enough time to warmup.

martinfou commented 9 years ago

You might want to check this parameter in the config file. commitDisabled Disable the sending of commit commands to the Solr Server.in the solr commiter config.

it might solve the problem since the Solr server will be in charge of the commits and not the solrj client.

here is a link to the feature request https://github.com/Norconex/committer-solr/issues/4

here is a link to the documentation of the tag name commitDisabled http://www.norconex.com/collectors/committer-solr/configuration

OkkeKlein commented 9 years ago

Will maxRetryWait work if I have 100+ batches that need deleting? Could be an option, but might also get messy.

Using Solr to handle the commits is an option, but there are scenario's where you want to control the opening of a new searcher.

Easiest way to prevent this is to add delay before committing next batch imo,

essiembre commented 9 years ago

Good idea. I am marking this as a feature request to add a flag for a "minimum delay between commits".

essiembre commented 9 years ago

Created a new feature request here: https://github.com/Norconex/committer-solr/issues/6. Closing this one.