elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.9k stars 24.73k forks source link

Account for TransportMultiSearchAction's response array in Circuit Breaker #32051

Open polyfractal opened 6 years ago

polyfractal commented 6 years ago

TransportMultiSearchAction has an AtomicArray that is used to collect individual search responses as they finish executing. When the last request finishes, the results in the array are packaged into a MultiSearchResponse and sent back to the client.

If the multi-search involves a large number of shards, or the responses are very large, or there are many multi-searches in parallel (or all three)... this seems like a prime candidate for causing an OOM on smaller heaps.

I don't believe we track this array in any circuit breaker explicitly, and since it is holding finished results it isn't subject to any breakers in place for the search phase. If the responses come back to the coordinating node in a staggered manner it is unlikely to trip the in-flight breaker either.

It would be nice if we could account for this array in the Request breaker somehow. I imagine the tricky bit is estimating how big the various SearchResponse are (or Exception in the case of failures). @dakrone suggested maybe selecting n responses and averaging their size to use as a heuristic.

Tagging both Core/CB and Search because I'm indecisive...sorry! :)

elasticmachine commented 6 years ago

Pinging @elastic/es-core-infra

elasticmachine commented 6 years ago

Pinging @elastic/es-search-aggs

altinp commented 4 years ago

It's not just multi-search, even one (sizable) search request (i.e. via TransportSearchAction) against many shards can cause an OOM. This is easily reproducible from at least 5.6-7.4, instantly so if logging to file/console is turned off.

48910

rjernst commented 3 years ago

While this issue involves circuit breakers, it is in fact just about a particular use of circuit breakers, but not the circuit breaker infrastructure. So I am removing the core/infra label and leaving to to the search team to implement/manage as they see fit.

elasticsearchmachine commented 3 months ago

Pinging @elastic/es-search-foundations (Team:Search Foundations)