Norconex / committer-solr

Solr implementation of Norconex Committer. Should also work with any Solr-based products, such as LucidWorks.
https://opensource.norconex.com/committers/solr/
Apache License 2.0
3 stars 5 forks source link

Two collectors making commits at the same time #4

Closed csaezl closed 8 years ago

csaezl commented 9 years ago

I'm trying two collectors running at the same time and sending documents to the same Solr repository. I'm getting " PERFORMANCE WARNING: Overlapping onDeckSearchers=2" errors. After reading some literature about the subject I suppose the better way of doing commits without getting errors is to AutoCommit by time (giving apart the soft commit matter) or by documents, as HTTP Collector implements. Taken for granted that HTTP Collector commits are hard commits, how can I inhibit HTTP Collector from making commits?. Is this the right solution?.

csaezl commented 9 years ago

I'd appreciate an answer.

There is another reason for sending documents to Solr without commit, if I'm right. Sometimes, when the crawler sends a group of n documents to Solr, it answers with an error for one of the documents (Could not commit batched operation .... writing Id XX to the index ...... possible analysis error). I think the group is discarded, not just the offending document. So, the bigger the group, the bigger the error

martinfou commented 9 years ago

Greetings,

To help me get a feel of what could be your problem can you tell me what are your settings for frequency of commits? autowarmCount in solrconfig.xml?

Did you take a look at this explanation from the Solr FAQ ?

What does "PERFORMANCE WARNING: Overlapping onDeckSearchers=X" mean in my logs?

This warning means that at least one searcher hadn't yet finished warming in the background, when a commit was issued and another searcher started warming. This can not only eat up a lot of ram (as multiple on deck searches warm caches simultaneously) but it can can create a feedback cycle, since the more searchers warming in parallel means each searcher might take longer to warm.

Typically the way to avoid this error is to either reduce the frequency of commits, or reduce the amount of warming a searcher does while it's on deck (by reducing the work in newSearcher listeners, and/or reducing the autowarmCount on your caches)

csaezl commented 9 years ago

Thank you very much for the information. Yes, this was the first text I read when searching for a solution.

This overlapping occurs because I need two collectors running at the same time and commiting documents at the same time to Solr, so chances are that their commits overlap, so the error (and the consequences) raises. I don't need to reduce the error, I need to avoid it.

To avoid it, the solution is that the crawler sends the documents without commit (hard commit) and let AutoCommit do the job based on its parametes as defined in solrconfig.xml file.

The question is: what is the parameter that instructs the crawler not to send the commit (hard commit) to Solr?. If the crawler always sends the commit (hard commit), then there is no option for AutoCommit to work.

martinfou commented 9 years ago

Ok now I get what you are trying to accomplish. Short answer, At the moment there is no parameter to turn on / off the commit operation. Let me create a feature request for this one and let me see what I can do.

csaezl commented 9 years ago

Thank you, Martin

csaezl commented 9 years ago

Dou you plan to add this feature in a near future?. I'm stuck until the crawler allows AutoCommit to work

csaezl commented 9 years ago

Any news?

martinfou commented 9 years ago

I started to work on it and then got pulled away. Let me see if I cannot work on it this weekend.

csaezl commented 9 years ago

Anything new after a month since the last post?

martinfou commented 9 years ago

I'll have something for you today. Let me finish running few more tests and document the changes.

csaezl commented 9 years ago

Thank you. Will you deliver a snapshot?

essiembre commented 9 years ago

A snapshot is now available with this new feature from @martinfou. You can download it here.

csaezl commented 9 years ago

Thank you very much

csaezl commented 8 years ago

As for me, this issue can be closed.