Executing ESQL queries normally requires that the coordinator node and data nodes communicate: the coordinator sends logical plans to the data nodes, the data nodes send pages with results back to the coordinator.
In both directions, the transport message size seems to be unbounded, and there also seems to be no circuit breaker; we've seen cases where particularly large logical plans caused gigabytes of data to be in buffered in the NettyAllocator.
Executing ESQL queries normally requires that the coordinator node and data nodes communicate: the coordinator sends logical plans to the data nodes, the data nodes send pages with results back to the coordinator.
In both directions, the transport message size seems to be unbounded, and there also seems to be no circuit breaker; we've seen cases where particularly large logical plans caused gigabytes of data to be in buffered in the NettyAllocator.
While some issues were addressed in https://github.com/elastic/elasticsearch/pull/112008, https://github.com/elastic/elasticsearch/pull/111447 and https://github.com/elastic/elasticsearch/pull/111973, we should find other situations where this can happen, test it and fix it if needed. I.e.
This is similar to our HeapAttack tests, but distributed.