amateescu / search_api_solr

11 stars 14 forks source link

delete should not commit #25

Closed Berdir closed 9 years ago

Berdir commented 10 years ago

As discussed, delete should not force-commit to prevent timeouts, solr will purge automatically after 2m.

However, I assume this will not work with the tests, as they expect that deletions happen immediately?

drunken-monkey commented 10 years ago

I don't think that's a good idea. Not committing when indexing is one thing – then some items will just not be in search results. However, when deleting without committing, deleted items will be listed in search results, which could lead to all kinds of weird issues. It was a deliberate decision to have a commit there. Could you please summarize what problem you see with this?

Berdir commented 10 years ago

The problem is that I'm seeing 15s+ delays/timeouts on node deletions, which is messing up my tests.

This is with 3.6. Solr 4 might be better, but I currently went with the older version as it's easier to install on debian/ubuntu and will possibly also be what we will have on production.

drunken-monkey commented 10 years ago

Looking at the code, you should probably make the commit a separate call instead of including it in the update. At least then you would know if the delete itself succeeded.

And 15s+ seems incredibly high. How many documents do you have in that index? Anyways, if it's really that bad, I'd still suggest switching to Solr 4. It has to be correct first, then fast, so there definitely should be a commit there. (Or we could make a "never commit manually" advanced option for the backend, maybe. But generally we should just encourage people to switch to Solr 4, I think.)

And why do you say Solr 3 is easier to install? Seems to me both are exactly the same.

Berdir commented 10 years ago

That's the fun part, there is pretty much nothing in there.

In fact, the specific things that I am deleting did in fact not exist, because I do not have index immediately enabled and those are nodes that are created in behat tests and then immediately deleted again.

nickveenhof commented 10 years ago

Would be interesting to see what exact calls you are seeing in your solr log. Can you paste them here? The timings of the calls are also shown there normally so it would be quite easy to see where things go wrong.

It could also be that if you are doing commits at the same time, and more than 2 simultaneously it hits the waitSearcher limit. In your commit you can put this attribute to false so it deletes it but does not wait for the searcher to come back. However, this is not a best practice. See documentation here:

https://wiki.apache.org/solr/UpdateXmlMessages "waitSearcher = "true" | "false" — default is true — block until a new searcher is opened and registered as the main query searcher, making the changes visible."

Even better would be something like this:

curl http://localhost:8983/solr/update?commitWithin=500 -H "Content-Type: text/xml" --data-binary '<add><doc><field name="id">testdoc</field></doc></add>'

This tells solr to delete it, return without blocking and within 500 msec it will execute the commit. This doesn't block your drupal site and you know for sure the commit will happen soon.

Deletes in Apache Solr Drupal module also did not have an explicit commit, since explicit commits are really painful in Solr 3. We should really try to avoid that. Also, if you have a master/slave setup, the item will still be there for at least 2 minutes, so having it for another 0.5 seconds is really not that bad. When working with Solr 3, this is what you get. With Solr 4 the item will be deleted from your master instantly with softcommits but the slave will still have it. with SolrCloud and Solr 4 it will be gone instantly using the softcommit and zookeeper that distributes it around.

nickveenhof commented 10 years ago

I'm testing this out. It won't work below solr 3.6 or above since they only added this for delete queries in solr 3.6. See https://issues.apache.org/jira/browse/SOLR-2280

nickveenhof commented 10 years ago

https://github.com/amateescu/search_api_solr/pull/26/files I implemented what I suggested. Now all commits that need to happen will happen in 1 second after the actual commit was made. This is valid for the deletes but also the adds. The additional benefit we get from this is that it is automatically a softcommit where possible.

drunken-monkey commented 10 years ago

The code to add the commits looks a bit aweful in Solarium, but of course that can't be helped. We should move it to a dedicated method, though, instead of copying it in three places. Other than that, it's a great solution, I'd say. I would probably have picked more like 500 ms instead, but that's of course a minor detail, a second will also be fine. However, was there any specific reason why the previous method of committing at the end of the page request didn't work anymore? I thought that was also a good solution – although I guess this one will perform a bit better.

Berdir commented 10 years ago

https://github.com/amateescu/search_api_solr/issues/22

Edit: That issue explains why the shutdown function was IMHO a bad match.