apache / druid

Apache Druid: a high performance real-time analytics database.
https://druid.apache.org/
Apache License 2.0
13.52k stars 3.71k forks source link

Add TopN Query Aggregator Memory Guardrails #17439

Open jtuglu-netflix opened 3 weeks ago

jtuglu-netflix commented 3 weeks ago

Adds a way to monitor aggregator map entry increases for TopN queries. This is in response to a problem we've seen in queries where a topN query is done with an expensive aggregator on a high-cardinality dimension. I've reproduced this problem locally, and confirmed that this fixes the issue in most cases. The other cases are outlined in Drawbacks.

Description

Approach

Use a configurable per (query, historical) fixed size byte amount that aggregators can take up. The byte count is maintained at a segment-level during each query runner's pass through a segment (getting result sequence). I opted for this instead of doing a %-of-heap based approach as in high-traffic scenarios, there could be multiple queries racing to allocate some memory for aggregators, and these could all read say, 5% of total available heap (let's say this is permissible % to allocate). If we're already at 80-90%, this could result poorly. Instead, using a fixed amount is a bit more cumbersome, but at least guarantees a realistic upper-bound on how much memory N concurrent queries could theoretically use. Changing to either approach (or another) is easy enough. I found the latter performed more consistently in local testing with artificially low heap sizes and with parallel queries. Another alternative I was thinking was a shared buffer that queries can "borrow" from for doing their queries, where this pool would be shared amongst all running queries. This is a bit like what GroupBy does.

Drawbacks

Release note


Key changed/added classes in this PR

This PR has:

samarthjain commented 1 week ago

Use a configurable (per-query, per-segment) fixed size byte amount that aggregators can take up.

This part of PR description confused me a little bit because the configuration is really at a query level and not at a segment level (which would be a bit hard to set the right limit for).

jtuglu-netflix commented 1 week ago

Use a configurable (per-query, per-segment) fixed size byte amount that aggregators can take up.

This part of PR description confused me a little bit because the configuration is really at a query level and not at a segment level (which would be a bit hard to set the right limit for).

Since this is initialized per-runner (which can be across different segments on different historicals, etc.), this is technically unique per (query-id, segment-id), or more generally per SpecificSegmentQueryRunner.

Edit:

This is now tracks at the (query, historical) level.