elastic / elasticsearch

Free and Open, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
68.53k stars 24.34k forks source link

Explore introducing `routing_path` into LogsDB #109334

Open salvatore-campagna opened 1 month ago

salvatore-campagna commented 1 month ago

Description

PR https://github.com/elastic/elasticsearch/pull/108896 introduced LogsDB and the logs index mode. Routing path and the _routing field is not supported anyway. Using the routing path can improve storage as a result of enabling better data partitioning and clustering of documents with similar data patterns. As a result, we anticipate that its introduction, might reduce storage usage by enabling better compression. As a result, we should explore options around introducing custom _routing.

One challenge behind it is that, by default the _id field is used as the routing value, while, when indexing documents with a custom _routing, the uniqueness of the _id is not guaranteed across all of the shards in the index (documents with the same _id could end up on different shards). Making sure that the _id is unique is up to the user. The other option would be to come up with a solution similar to what we use in tsdb that allows us to generate unique IDs but still allowing for usage of custom _routing.

elasticsearchmachine commented 1 month ago

Pinging @elastic/es-storage-engine (Team:StorageEngine)