apache / lucene

Apache Lucene open-source search software
https://lucene.apache.org/
Apache License 2.0
2.6k stars 1.01k forks source link

Move live docs to disk? [LUCENE-9388] #10428

Open asfimport opened 4 years ago

asfimport commented 4 years ago

Live docs are the last file format whose memory usage is a function of maxDoc, let's look into moving them to disk?


Migrated from LUCENE-9388 by Adrien Grand (@jpountz)

asfimport commented 4 years ago

David Smiley (@dsmiley) (migrated from JIRA)

This idea surprises me. Doesn't this need to be an efficient BitSet? Is accessing memory-mapped data approximately zero relative to on-heap?

asfimport commented 4 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Memory mapped IO is likely plenty fast, but there may indeed be some overhead versus the long[] that backs our typical live docs implementations?

Since live docs are mostly accessed forward only iterator style (like doc values), maybe switching the random access BitSet we use today to an iterator over bits is the first step?  We might also want to invert it (again!!), so the iterator's .next() advances to the next deleted doc, since for most indices more documents are live than not.

We should just try it and see :)