Open jimczi opened 2 years ago
Pinging @elastic/es-search (Team:Search)
I asked @qhoxie for his take on this one. While it would be nice to ensure a continous experience with binary doc values when upgrading to 8.0, there is a lot on our plate already and this problem looks like something that could wait until an 8.x release. So we were leaning towards not making this issue a 8.0 blocker but keeping it in our backlog.
Just came across this and it bit me in the behind, I was frantically researching why the dvd files on disk weren't compressed, going through tickets, wildcard field PRs, Lucene PRs, etc, before finding this ticket. I think this can be a big deal-breaker for some folks, especially as the wildcard-fields were advertised as being very space-efficient due to compression (e.g. in the blog post). I certainly won't be able to continue with what I had planned this way.
We discussed this within the team today and decided the following:
Pinging @elastic/es-search-foundations (Team:Search Foundations)
Pinging @elastic/es-storage-engine (Team:StorageEngine)
Binary doc values are no longer compressed in Lucene 9. We added the compression in Lucene 8x to support the
wildcard
field but it proved controversial so the feature was removed from the library. We rely on compression to reduce the index size that thewildcard
field incurs. Without these savings the storage cost of this field type would be prohibitive.Should we maintain a custom doc values format on our end to bring the compression back ? We have multiple usage of the binary doc values (vector field, wildcard field, ...) in ES so we should probably look on a case by case.
I marked this issue as a blocked for 8.0, even if we decide against restoring the compression we should document the limitation in Elasticsearch 8.0.