elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.91k stars 24.73k forks source link

Move to Lucene 9.12's new PostingsFormat. #115021

Open jpountz opened 1 day ago

jpountz commented 1 day ago

A while back, Lucene changed the way that it encodes doc IDs from PFOR-delta to FOR-delta, which is a bit faster but less space-efficient. In order to avoid introducing space-efficiency regressions (especially on dense postings lists, which are common on Logging datasets), @iverase moved Elasticsearch to a copy of the Lucene postings format that would still use PFOR-delta for compression. (#103601)

But Lucene 9.12 introduced a new postings format that has better skipping logic (in general). It would be nice to take advantage of it. I would suggest the following plan:

elasticsearchmachine commented 1 day ago

Pinging @elastic/es-storage-engine (Team:StorageEngine)