elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.57k stars 24.63k forks source link

Should we restore compression on binary doc values #78266

Open jimczi opened 2 years ago

jimczi commented 2 years ago

Binary doc values are no longer compressed in Lucene 9. We added the compression in Lucene 8x to support the wildcard field but it proved controversial so the feature was removed from the library. We rely on compression to reduce the index size that the wildcard field incurs. Without these savings the storage cost of this field type would be prohibitive.

Should we maintain a custom doc values format on our end to bring the compression back ? We have multiple usage of the binary doc values (vector field, wildcard field, ...) in ES so we should probably look on a case by case.

I marked this issue as a blocked for 8.0, even if we decide against restoring the compression we should document the limitation in Elasticsearch 8.0.

elasticmachine commented 2 years ago

Pinging @elastic/es-search (Team:Search)

jpountz commented 2 years ago

I asked @qhoxie for his take on this one. While it would be nice to ensure a continous experience with binary doc values when upgrading to 8.0, there is a lot on our plate already and this problem looks like something that could wait until an 8.x release. So we were leaning towards not making this issue a 8.0 blocker but keeping it in our backlog.

heipei commented 2 years ago

Just came across this and it bit me in the behind, I was frantically researching why the dvd files on disk weren't compressed, going through tickets, wildcard field PRs, Lucene PRs, etc, before finding this ticket. I think this can be a big deal-breaker for some folks, especially as the wildcard-fields were advertised as being very space-efficient due to compression (e.g. in the blog post). I certainly won't be able to continue with what I had planned this way.

mayya-sharipova commented 8 months ago

We discussed this within the team today and decided the following:

elasticsearchmachine commented 2 months ago

Pinging @elastic/es-search-foundations (Team:Search Foundations)

elasticsearchmachine commented 4 weeks ago

Pinging @elastic/es-storage-engine (Team:StorageEngine)