Use IndexInput#prefetch for postings, skip data and impacts

jpountz commented 1 month ago

This uses the IndexInput#prefetch API for postings. This relies on heuristics, as we don't know ahead of time what data we will need from a postings list:

Postings lists are prefetched entirely when they are short (< 16kB).
Impacts enums also prefetch the first page of skip data.
Postings enums prefetc skip data on the first call to advance().

Positions, offsets and payloads are never prefetched.

Putting the IndexInput#prefetch call in TermsEnum#postings and TermsEnum#impacts works well because BooleanQuery will first create postings/impacts enums for all clauses before it starts unioning/intersecting them. This allows the prefetching logic to run in parallel across all clauses of the same query on the same segment.

rmuir commented 1 month ago

Can we do better than blindly prefetching skipdata? Currently, skipdata is not used until we advance() past doc in the first block:

https://github.com/apache/lucene/blob/83db8dfba30447dc154efbf98a46048ebe5743ef/lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99PostingsReader.java#L409

Should we instead try to preload skipdata the first time advance() is actually called for the enum? This still happens before skipper might get used, because we might fullfill a few calls to advance() from the first block's docs, but I like that we have evidence now that skipper will most likely be needed. e.g. a one time invocation before this "if": https://github.com/apache/lucene/blob/83db8dfba30447dc154efbf98a46048ebe5743ef/lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99PostingsReader.java#L502-L504

mikemccand commented 1 month ago

We might (eventually, later) consider an API change when pulling postings that expresses that the caller intends to advance? Or, maybe simpler would be expressing that caller will not use advance, not sure.

mikemccand commented 1 month ago

This is cool! In the hot case, do we expect prefetch to be a no-op? So we are hoping for "first do no harm" in that case? (I haven't looked at MMapDirectory's impl yet). But in the cold case, this could cause multiple prefetch calls within one segment if the query has multiple terms. And if cross-segment concurrency is enabled, multiple such calls across slices (thread work units) too.

But in the cold case this may be a big win?

jpountz commented 1 month ago

This is cool! In the hot case, do we expect prefetch to be a no-op? So we are hoping for "first do no harm" in that case?

Yes, mostly. The benchmark I ran at https://github.com/apache/lucene/pull/13337#issuecomment-2095430556 suggests that there is a 1-2 us overhead for prefetching when data fits in the page cache. (Possibly more under concurrent search, I'd need to benchmark this as well.) So with 50 segments, 4 clauses and assuming 3 prefetch calls per clause (one in the terms dict, one for postings, one for skip data), this would give a total overhead of 3250*4 = 1200us = 1.2ms, which looks ok to me.

But in the cold case, this could cause multiple prefetch calls within one segment if the query has multiple terms. And if cross-segment concurrency is enabled, multiple such calls across slices (thread work units) too.

This is correct. We could also look into doing inter-segment I/O concurrency in the same thread in the future, e.g. by fetching all scorers first, and then evaluating these scorers against their segment, but I need to think more of the trade-off.

But in the cold case this may be a big win?

Yes! Hopefully with a few more changes I'll be able to confirm this by running a benchmark that runs actual Lucene queries. Early experiments suggest that there is a big win indeed.

apache / lucene

Use IndexInput#prefetch for postings, skip data and impacts #13364