elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
1.17k stars 24.84k forks source link

Limit the max size of terms aggregation #112549

Open iverase opened 2 months ago

iverase commented 2 months ago

Users can limit the number of buckets generated by a terms aggregation and therefore as well the memory used by it by using the size parameter. As said in the documentation the value for this parameter should not be bigger that search.max_buckets or in that case the aggregation will likely throw a TooManyBucketException or even worst, cause the coordinator node to go OOM.

I wonder if we should limit the value of the size to the value in search.max_buckets. I see two options:

1) Throw an exception if the user tries to set a value bigger than max buckets. This prevents user to build trappy queries but it will massively breaking as any query that uses a big value will stop working.

2) Silently override the value of size to search.max_buckets whenever is bigger than such value. This will give those queries hope to return an answer and partially limit the amount of heap used (e.g TopBucketBuilder). This would be a positive breaking change as queries that might not work before, will work after this change.

P.S.- This issue is focus to terms aggregation but applies to any bucket aggregation that accepts a size parameter).

elasticsearchmachine commented 2 months ago

Pinging @elastic/es-analytical-engine (Team:Analytics)