elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.6k stars 24.63k forks source link

DiskThresholdMonitor retains large heap objects unnecessarily #94914

Open DaveCTurner opened 1 year ago

DaveCTurner commented 1 year ago

The DiskThresholdMonitor enqueues a batched reroute if the watermarks situation in the cluster changes:

https://github.com/elastic/elasticsearch/blob/b4712a5c705c9aa44ca32e41e9e0ba513548ddad/server/src/main/java/org/elasticsearch/cluster/routing/allocation/DiskThresholdMonitor.java#L317-L363

The listener may be called quite some time later, but it retains the original ClusterInfo (potentially 10s of MiBs) for its computations. Could we discard that and use a more recent ClusterInfo?

That's not all that bad, because we only ever have one such task pending. The worst bit is that this listener takes the reroutedClusterState, which means the BatchedRerouteService captures the newState here ...

https://github.com/elastic/elasticsearch/blob/405a53b3c37699ac53f4eada855a5a3a14783bbe/server/src/main/java/org/elasticsearch/cluster/routing/BatchedRerouteService.java#L154-L156

... and keeps hold of it until the reroute completes (i.e. we converge on a new balance and start to reconcile it) which could be quite some time later. That means that every other batched reroute also retains the state until reconciliation starts, which could represent a rather large amount of heap.

elasticsearchmachine commented 1 year ago

Pinging @elastic/es-distributed (Team:Distributed)