elastic / elasticsearch

Free and Open, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
68.51k stars 24.33k forks source link

LogsDB - Indexing overhead performance testing of `ignore_malformed` #110255

Open salvatore-campagna opened 3 days ago

salvatore-campagna commented 3 days ago

Description

LogsDB uses stored fields under the hood to deal with malformed field values and as a fallback mechanism in situations where doc values are not available. This behaviour favours user experience at the expense of performance, especially indexing performance (CPU/memory overhead). A significant factor is that supporting ignore_malformed needs some copying of different data structures involved in the parsing logic. These copy operations occur whenever ignore_malformed is enabled for complex field types where the entire field content must be preserved to handle potential parsing issues with nested values. In such scenarios, the copied data structures are used to capture and store malformed values.

We need to evaluate the performance penalty introduced by this logic and understand its impact especially on indexing throughput. Ideally we would like run two tests using the same index mode but with and without ignore_malformed enabled. Note also that standard log templates enable ignore_malformed by default in order to avoid data loss.

The outcome of this benchmarking activity is crucial for planning any necessary actions to mitigate performance issues in the event of unacceptable overhead.

elasticsearchmachine commented 3 days ago

Pinging @elastic/es-storage-engine (Team:StorageEngine)

kkrik-es commented 3 days ago

ignore_above does this as well, at least for flattened fields. We should include that in our testing as well.