elastic / elasticsearch

Free and Open, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.42k stars 24.57k forks source link

Fielddata circuit breaker doesn't seem to limit cache size #97075

Open DaveCTurner opened 1 year ago

DaveCTurner commented 1 year ago

A v8.6 user on the forums reported experiencing OOMEs and when they analysed the heap dump they found that a high fraction of their 3GiB heap was used by the fielddata cache. GET /_nodes/_all/stats/breaker?filter_path=nodes.*.breakers.fielddata agrees:

{
  "nodes": {
    "K6V95L0pR36L-_99LIapdw": {
      "breakers": {
        "fielddata": {
          "limit_size_in_bytes": 1288490188,
          "limit_size": "1.1gb",
          "estimated_size_in_bytes": 2766456144,
          "estimated_size": "2.5gb",
          "overhead": 1.03,
          "tripped": 0
        }
      }
    }
  }
}

They have worked around this problem by setting indices.fielddata.cache.size: 1gb but I think it's a bug for the fielddata cache to grow without bounds by default like this.

elasticsearchmachine commented 1 year ago

Pinging @elastic/es-search (Team:Search)

iverase commented 1 year ago

The issue probably comes from building global ordinals where we account for the memory usage but never break:

https://github.com/elastic/elasticsearch/blob/6566bb40755c75090d66beecb22fe933c850e626/server/src/main/java/org/elasticsearch/index/fielddata/ordinals/GlobalOrdinalsBuilder.java#L55

This line of code looks the same since 2014 so we have never circuit break here.

LukoJy3D commented 5 months ago

Can confirm that this is still a problem on 7.17.18, and it seems on larger text aggregations, fielddata cache grows uncontrollably, ignoring all default breaker limits (heap is 32gb): image

Even though setting indices.fielddata.cache.size solves these issues, it does not seem like a valid solution :pray:

elasticsearchmachine commented 1 month ago

Pinging @elastic/es-analytical-engine (Team:Analytics)