Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer [LUCENE-1316]

asfimport commented 16 years ago

The isDeleted() method on IndexReader has been mentioned a number of times as a potential synchronization bottleneck. However, the reason this bottleneck occurs is actually at a higher level that wasn't focused on (at least in the threads I read).

In every case I saw where a stack trace was provided to show the lock/block, higher in the stack you see the MatchAllScorer.next() method. In Solr paricularly, this scorer is used for "NOT" queries. We saw incredibly poor performance (order of magnitude) on our load tests for NOT queries, due to this bottleneck. The problem is that every single document is run through this isDeleted() method, which is synchronized. Having an optimized index exacerbates this issues, as there is only a single SegmentReader to synchronize on, causing a major thread pileup waiting for the lock.

By simply having the MatchAllScorer see if there have been any deletions in the reader, much of this can be avoided. Especially in a read-only environment for production where you have slaves doing all the high load searching.

I modified line 67 in the MatchAllDocsQuery FROM: if (!reader.isDeleted(id)) { TO: if (![reader.hasDeletions() ](reader.hasDeletions() )reader.isDeleted(id)) {

In our micro load test for NOT queries only, this was a major performance improvement. We also got the same query results. I don't believe this will improve the situation for indexes that have deletions.

Please consider making this adjustment for a future bug fix release.

Migrated from LUCENE-1316 by Todd Feak, 1 vote, resolved Jan 25 2009 Environment:

All

Attachments: LUCENE_1316.patch (versions: 3), LUCENE-1316.patch (versions: 3), MatchAllDocsQuery.java

asfimport commented 16 years ago

Todd Feak (migrated from JIRA)

My version of MatchAlldocsQuery.java which has the modification in it.

asfimport commented 16 years ago

Todd Feak (migrated from JIRA)

Further investigation indicates that the ValueSourceQuery$ValueSourceScorer may suffer from the same issue and benefit from a similar optimization.

asfimport commented 16 years ago

Yonik Seeley (@yonik) (migrated from JIRA)

Although this doesn't solve the general problem, this probably still makes sense to do now for the no-deletes case. Todd, can you produce a patch? See http://wiki.apache.org/lucene-java/HowToContribute

asfimport commented 16 years ago

Chris M. Hostetter (@hossman) (migrated from JIRA)

rather then attempting localized optimizations of individual Query classes, a more generalized improvements would probably be to change SegmentReader.isDeleted so that instead of the entire method being synchronized, it first checks if the segment has any deletions, and if not then enters a synchronized block to check deletedDocs.get(n).

asfimport commented 16 years ago

Todd Feak (migrated from JIRA)

I like Hoss' suggestion better. I'll try that fix locally and if it provides the same improvement, I will submit a patch for you.

asfimport commented 16 years ago

Yonik Seeley (@yonik) (migrated from JIRA)

> a more generalized improvements would probably be to change SegmentReader.isDeleted so that instead of the entire method being synchronized

Right, but that's not totally back compatible. Code that depended on deletes being instantly visible across threads would no longer be guaranteed.

asfimport commented 16 years ago

Chris M. Hostetter (@hossman) (migrated from JIRA)

Code that depended on deletes being instantly visible across threads would no longer be guaranteed.

you lost me there ... why would deletes be stop being instantly visible if we changed this...

  public synchronized boolean isDeleted(int n) {
    return (deletedDocs != null && deletedDocs.get(n));
  }

...to this...

  public boolean isDeleted(int n) {
    if (null == deletedDocs) return false;
    synchronized (this) { return (deletedDocs.get(n)); }
  }

?

asfimport commented 16 years ago

robert engels (migrated from JIRA)

According to the java memory model, hasDeletions() would need to be synchronized as well , since if another thread did perform a deletion, it would need to be up to date.

This might work in later JVMs by declaring the deletedDocs variable volatile, but no guarantees.

Seems better to ALLOW this behavior, that a reader might not see up to date deletions made during a query, and do a single synchronized check of deletions at the start.

asfimport commented 16 years ago

Yonik Seeley (@yonik) (migrated from JIRA)

why would deletes be stop being instantly visible

It's minor, but before, if thread A deleted a document, and then thread B checked if it was deleted, thread B was guaranteed to see that it was in fact deleted.

If the check for deletedDocs == null were moved outside of the synchronized, there's no guarantee when thread B will see (if ever) that deletedDocs has been set (no memory barrier).

It's not a major issue since client code shouldn't be written that way IMO, but it's worth factoring into the decision.

asfimport commented 16 years ago

robert engels (migrated from JIRA)

According to

http://www.ibm.com/developerworks/java/library/j-jtp06197.html

declaring the deletedDocs volatile should do the trick.

asfimport commented 16 years ago

robert engels (migrated from JIRA)

The Pattern#5 referenced (cheap read-write lock) is exactly what is trying to be accomplished.

asfimport commented 16 years ago

Yonik Seeley (@yonik) (migrated from JIRA)

declaring the deletedDocs volatile should do the trick.

Right... that would be cheaper when no docs were deleted. But would it be more expensive when there were deleted docs (a volatile + a synchronized?) I don't know if lock coarsening could do anything with this case...

asfimport commented 16 years ago

Mark Miller (@markrmiller) (migrated from JIRA)

If I remember correctly, volatile does not work correctly until java 1.5. At best I think it was implementation dependent with the old memory model.

edit

maybe its ok under certain circumstances:

http://www.ibm.com/developerworks/library/j-jtp02244.html

Problem #2: Reordering volatile and nonvolatile stores

asfimport commented 16 years ago

Chris M. Hostetter (@hossman) (migrated from JIRA)

if thread A deleted a document, and then thread B checked if it was deleted, thread B was guaranteed to see that it was in fact deleted.

Hmmm.... i'll take your word for it, but i don't follow the rational: the current synchronization just ensured that either the isDeleted() call will complete before the delete() call started or vice versa – but you have no guarantee that thread B would run after thread A and get true. .... unless... is your point that without synchronization on the null check there's no garuntee that B will ever see the change to deletedDocs even if it does execute after delete() ?

either way: robert's point about hasDeletions() needing to be synchronized seems like a bigger issue – isn't that a bug in the current implementation? assuming we fix that then it seems like the original issue is back to square one: synchro bottlenecks when there are no deletions.

asfimport commented 16 years ago

robert engels (migrated from JIRA)

Hoss, that is indeed the case, another thread would see deletedDocs as null, even though another thread has set it

hasDeletions does not need to be synchronized if deletedDocs is volatile

asfimport commented 16 years ago

Yonik Seeley (@yonik) (migrated from JIRA)

is your point that without synchronization on the null check there's no garuntee that B will ever see the change to deletedDocs even if it does execute after delete()

Right... it's about the memory barrier.

The reality is that there is normally a need for higher level synchronization anyway. That's why it was always silly for things like StringBuffer to be synchronized.

assuming we fix that then it seems like the original issue is back to square one: synchro bottlenecks when there are no deletions.

A scorer could just check once when initialized... there's never been any guarantees about in-flight queries immediately seeing deleted docs changes - now that really doesn't make sense. TermScorer grabs the whole bit vector at the start and never checks again.

asfimport commented 16 years ago

Todd Feak (migrated from JIRA)

I wanted to share my micro load test results with you, to make sure you all understand scale of the bottleneck as we are experiencing it.

For an optimized index with 4700+ documents (ie small), a NOT query varies by a factor of 35 under heavy load. Using 2.3.0 release I got 20 tps. With the volatile/synchronized fix suggested, I got 700 tps. The limiting factor on the 700 tps was the CPU on the computer throwing load, so this may be even worse. Furthermore, the more documents that exist in the index, the worse this may get, as it synchonizes on every single iteration through the loop.

An argument can be made that this is just a necessary evil, and that we must synchronize on this for the possibility of updates during reads. I have 2 questions regarding that.

What is the cost of a dirty read in that case? Is it stale data, incorrect data, or a corrupted system?
What is more prevalent in a production system. Indexes with no deletes, indexes with some deletes, or indexes with frequent deletes?

Do we need to have 1 class that does it all, or should we consider 2 different implementation for 2 different uses. What about a read-only SegmentReader for read-only slaves in production environments?

asfimport commented 16 years ago

Yonik Seeley (@yonik) (migrated from JIRA)

20tps (queries per second?) for 4700 docs is very slow... how many threads were you testing with?

asfimport commented 16 years ago

Todd Feak (migrated from JIRA)

10 threads in JMeter throwing load at the Tomcat as fast as possible. The Tomcat was on a separate machine with more then 10 worker threads, though only 10 were in use at any one time.

To eliminate any differences, the tests were run back to back. The only difference was the lucene-core JAR and a Tomcat bounce between the tests. Otherwise, the same exact test is run in both cases. What you have is threads all trying to synchronize on isDeleted() 4700+ times per request. Lock contention goes through the roof. At any point during the test I can take a thread stack snapshot and they are all blocked waiting for isDeleted().

asfimport commented 16 years ago

Yonik Seeley (@yonik) (migrated from JIRA)

Attaching prototype patch (needs javadoc + tests if approach is acceptable) that avoids all synchronization when iterating over all documents.

If IndexReader.termDocs(Term term) is called with null, a TermDocs implementation is returned that matches all documents. This is the same approach used by TermScorer via SegmentTermDocs to avoid synchronization by grabbing the BitVector of deleted docs at instantiation.

This patch also updates MatchAllDocuments to use this TermDocs to iterate over all documents.

Advantages:

adds no new methods or interfaces, simply adds extra semantics to an existing method
works from the bottom-up... no need to instantiate a big BitVector
exposes the functionality to expert users for use in custom queries
avoids a binary search to find the correct IndexReader in a MultiReader for each call (it leverages all the TermDocs code in all IndexReader implementations such as MultiTermDocs).

Disadvantages:

TermDocs instance returned cannot be used to seek to a different term. However, this is minor and not a back compatibility concern since "null" was not previously a supported value.

On balance, I think it's 10% hack, 90% useful. Thoughts?

asfimport commented 16 years ago

Todd Feak (migrated from JIRA)

I applied that patch locally to a 2.3.0 build. Test results show this solution performs equally as the other solution did. I'd be happy with it for that reason alone. I cannot argue as to the quality of the proposed solution. I will leave that to those more knowledgeable on Lucene itself. Thank you for the time you guys have put into this issue for me.

asfimport commented 16 years ago

Yonik Seeley (@yonik) (migrated from JIRA)

Test results show this solution performs equally as the other solution did.

That's good news (as I assume this was for an optimized index). Would it be possible for you to try the same test on a non-optimized index (multi-segment) with a couple deletions?

asfimport commented 16 years ago

Todd Feak (migrated from JIRA)

I don't think that patch provides correct functionality. I went to run the load tests this morning against an un-optimized index and the query results do not match what an unpatched version does. Simply swapping the JAR and restarting returns different results for the same query. Specifically, empty (incorrect) results.

asfimport commented 16 years ago

Yonik Seeley (@yonik) (migrated from JIRA)

I'll look into it (that's why it was marked as "prototype")

asfimport commented 16 years ago

Yonik Seeley (@yonik) (migrated from JIRA)

I saw and fixed a bug that would affect multireaders.... patch attached. I've not yet written tests, so no guarantees. Edit: I reproduced... it still doesn't work, so hold off using this.

asfimport commented 16 years ago

Yonik Seeley (@yonik) (migrated from JIRA)

Sigh... the problem is that things are done in a two step process by default. A TermDocs is created, and then seek is called (by which time the impl is set already).

  public TermDocs termDocs(Term term) throws IOException {
    ensureOpen();
    TermDocs termDocs = termDocs();
    termDocs.seek(term);
    return termDocs;
  }

  public abstract TermDocs termDocs() throws IOException;

So a full solution down this road would be slightly more involved (overriding termDocs(Term) in all the sublcasses rather than just termDocs())

asfimport commented 16 years ago

Yonik Seeley (@yonik) (migrated from JIRA)

Here's an updated patch that should work correctly.

asfimport commented 16 years ago

Todd Feak (migrated from JIRA)

I applied the patch to the 2.3.0 file. I ran against an optimized and non-optimized (12 segment) index with 4700 entries.

2.3.0 non-optimized index 104 tps 2.3.0 patched non-optimized index 482 tps

2.3.0 optimized index 21 tps 2.3.0 patched optimized index 718 tps

The patch provided improvements in both optimized and unoptimized indexes. Thanks again Yonik.

asfimport commented 16 years ago

Chris M. Hostetter (@hossman) (migrated from JIRA)

TermDocs instance returned cannot be used to seek to a different term. However, this is minor and not a back compatibility concern since "null" was not previously a supported value.

so essentially this approach only improves MatchAllDocsQuery correct? .... Other use cases where lots of termDoc enumeration take place (RangeFilter and PrefixFilter type code) wouldn't' benefit from this at all.

Assuming the genuinely eliminating the synchronization is infeasible, the other approach that occurred to me along the lines of a "read only" IndexReader is that if we started providing a public method for getting the list of all deleted docs (either as a BitVector or as a DocIdSet or something) then it would be easy to implement a SnapshotFilteredIndexReader that on construction cached the current list of all deleted docs in the IndexReader it's wrapping, used it for all isDeleted() methods, and proxied all other methods to the underlying reader.

then things like MatchAllDocs, RangeFilter, and PrefixFilter could (optionally?) construct one of those prior to their big iteration loops, and use it in place of the original IndexReader. Trade the syncro bottleneck for deletion data as of the moment the query was started (a fair trade for most people)

asfimport commented 16 years ago

Yonik Seeley (@yonik) (migrated from JIRA)

so essentially this approach only improves MatchAllDocsQuery correct?

It would essentially simulate a term indexed for every document, and improve any query that could use that (i.e. that needed to iterate over all documents). For things like MatchAllDocuments and FunctionQuery on a MultiReader, it should actually be superior to the elimination of all synchronization on isDeleted() since it also removes the binary search to find the correct reader for a document.

Other use cases where lots of termDoc enumeration take place (RangeFilter and PrefixFilter type code) wouldn't' benefit from this at all.

Right, but using TermDocs, they already have the same style of optimization and hence suffer no synchronization overhead.

the other approach that occurred to me along the lines of a "read only" IndexReader is that if we started providing a public method for getting the list of all deleted docs

Right... that could be more useful for someone that needs random access to isDeleted(), at the cost of greater setup cost, and greater memory usage. However the TermDocs approach solves the use-cases we have today in lucene-core.

asfimport commented 16 years ago

Chris M. Hostetter (@hossman) (migrated from JIRA)

Right, but using TermDocs, they already have the same style of optimization and hence suffer no synchronization overhead.

Arggg.... sorry, my bad: i was thinking reader.isDeleted was used in those cases as well – I was totally missing that SegmentTermDocs takes care of it directly.

but ... thinking about how SegmentTermDocs are constructed for a moment, isn't the (unsynchronized) usage of deletedDocs in SegmentTermDocs.next prone to the same concern you had about removing (or reducing) the synchronization in SegmentReader.isDeleted ... "deletes aren't instantly visible across threads" ... are they?

Is SegmentTermDocs.next too lax in how it deals with deletedDocs, or is SegmentReader.isDeleted to paranoid?

asfimport commented 16 years ago

Yonik Seeley (@yonik) (migrated from JIRA)

but ... thinking about how SegmentTermDocs are constructed for a moment, isn't the (unsynchronized) usage of deletedDocs in SegmentTermDocs.next prone to the same concern you had about removing (or reducing) the synchronization in SegmentReader.isDeleted ... "deletes aren't instantly visible across threads" ... are they?

No, deletes aren't instantly visible across threads (when one thread has started a query and the other thread does a delete). AFAIK it's always been that way, so I think it's probably acceptable. On a practical level, seeking on a TermDocs crosses synchronization points, so deletes won't go unrecognized by other threads forever either.

But there is a thread-safety issue I've been mulling over since I wrote this patch. SegmentTermDocs.next() is fine... unsynchronized reads look benign on the BitVector class since writes just change a byte at a time. "count" could be off, but that's OK too... it will simply be a stale value since updates to it are atomic.

The issue is the possibility of grabbing a partially constructed BitVector object to begin with. Notice how I synchronize to avoid this in AllTermDocs:

  protected AllTermDocs(SegmentReader parent) {
    synchronized (parent) {
      this.deletedDocs = parent.deletedDocs;
    }
    this.maxDoc = parent.maxDoc();
  }

That should probably be done in SegmentTermDocs too. Without it, a null pointer exception or an array out of bounds exception could result when checking the BitVector.

asfimport commented 16 years ago

Robert Muir (@rmuir) (migrated from JIRA)

we use MatchAllDocs query also. In addition to what is described here we got very nice gains by overriding the Scorer.score() methods that take a HitCollector.

This seems pretty dumb but I guess since it has to score every doc the method call overhead actually matters?

asfimport commented 16 years ago

Mark Miller (@markrmiller) (migrated from JIRA)

The new ReadOnly IndexReader option resolves this issue, correct?

asfimport commented 15 years ago

Jason Rutherglen (migrated from JIRA)

I looked at 2.4 and am surprised this patch did not make it in. Read only IndexReader shouldn't be necessary to solve this issue. Were there any problems in the unit tests for this patch or something?

asfimport commented 15 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Yonik is your patch here (creating an AllTermDocs) ready to commit, once you fix SegmentTermDocs to also synchronize on parent when grabbing the deletedDocs?

asfimport commented 15 years ago

Jason Rutherglen (migrated from JIRA)

Added synchronized on parent when obtaining the deleted docs in SegmentTermDocs.

asfimport commented 15 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

I think at least AllTermDocs.java is missing from the latest patch here. When you apply a patch, make a change, and then "svn diff" in order to post a new patch, you have to [frustratingly] remember to first "svn add" any new files in the first patch.

I think the Subversion team is working on an "svn patch" command that would do this (and apply properties, and maybe deal with a 3-way merge, etc) but last I checked it's not done yet. So until then we have to workaround...

asfimport commented 15 years ago

Yonik Seeley (@yonik) (migrated from JIRA)

In a way, this is a mirror image of Jason's request in #2550 for a getDeletedDocs() that returned a DocIdSet... provided it also worked on a MultiReader, etc. MatchAllDocs could be efficiently implemented with that.

It does seem like having some sort of iterator over existing docs is useful to avoid the binary search cost of associating ids with segments, but there was never any feedback on what the API should be. Instead of adding new functionality to termDocs(), we could also add a getAllDocs() that returns either a DocIdSet or an interator.

asfimport commented 15 years ago

Jason Rutherglen (migrated from JIRA)

Patch includes AllTermDocs.

asfimport commented 15 years ago

Jason Rutherglen (migrated from JIRA)

In a way, this is a mirror image of Jason's request in

2550 for a getDeletedDocs() that returned a DocIdSet...

provided it also worked on a MultiReader, etc. MatchAllDocs could be efficiently implemented with that.

I think the API of reader.termDocs(null) is fine. If we move to a DocIdSet for deleteddocs we can change the all docs implementation if needed.

asfimport commented 15 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

OK, new patch attached:

Added more tests of *Reader.termDocs(null)
Changed IndexReader.termDocs javadoc to explain term=null case
Fixed MemoryIndex.termDocs(null) to return freq=1
Added CHANGES.txt entry

I think it's ready to commit. I'll wait a day or two.

asfimport commented 15 years ago

Yonik Seeley (@yonik) (migrated from JIRA)

+1 to the latest patch. Much thanks for picking this up! (I've been completely swamped.)

asfimport commented 15 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

OK, no problem Yonik – thanks for reviewing. I'll commit shortly.

asfimport commented 15 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Committed revision 737513. Thanks everyone!

apache / lucene

Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer [LUCENE-1316] #2393

2550 for a getDeletedDocs() that returned a DocIdSet...