Open jpountz opened 1 month ago
This is a draft as I need to do more work on tests and making sure that this new method cannot corrupt the state of the SegmentTermsEnum
.
Without the change: P50: 286 P90: 403 P99: 532
With the change: P50: 148 P90: 246 P99: 368
I iterated a bit on this change:
TermsEnum#prepareSeekExact
is introduced, which only prefetches data which is later going to be needed by TermsEnum#seekExact
.TermStates#build
no longer runs on the IndexSearcher
threadpool, but in the current thread, leveraging TermsEnum#prepareSeekExact
to parallelize I/O across all terms and segments.TermQuery
and SynonymQuery
call TermsEnum#prepareSeekExact
in Weight#scorerSupplier
so that the I/O associated with terms dictionary lookups is parallelized across clauses of the same BooleanQuery
.But I created a benchmark that starts looking like running a Lucene query that is encouraging
Was this with a forced-cold index?
It creates a 50GB terms dictionary while my machine only has ~28GB of RAM for the page cache, so many terms dictionary lookups result in page faults.
Results still look good.
Before the change: P50: 282 P90: 387 P99: 537
After the change: P50: 161 P90: 253 P99: 379
This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution!
I pushed a new approach. Instead of prepareSeekExact
returning void
, it now returns a Supplier
and forbids calling any other method on TermsEnum
until the Supplier
has been consumed. There are two benefits:
null
. In turn, this saves creating scorer on other required clauses of the same query.TermsEnum
instance, so you need multiple TermsEnum
instances if you want more I/O concurrency than that. This looks like a better trade-off to me, it only makes things like PKLookup
a bit more sophisticated if they want to do I/O concurrency.The benchmark still reports similar numbers:
Without the change
P50: 307
P90: 423
P99: 585
With the change
P50: 162
P90: 258
P99: 405
I will merge soon if there are no objections.
This introduces
TermsEnum#prepareSeekExact
, which essentially callsIndexInput#prefetch
at the right offset for the given term. Then it takes advantage of the fact thatBooleanQuery
already callsWeight#scorerSupplier
on all clauses, before later callingScorerSupplier#get
on all clauses. SoTermQuery
now callsTermsEnum#prepareSeekExact
onWeight#scorerSupplier
(if scores are not needed), which in-turn means that the I/O all terms dictionary lookups get parallelized across all term queries of aBooleanQuery
on a given segment (intra-segment parallelism).