ES|QL query plans can be pretty large, in some cases they can contain thousands of objects.
Eg. the plan of a query like from * will likely contain many FieldAttributes/EsFields.
The plan fragment will eventually serialized, and most likely the serialized plan will have an even larger footprint.
Some information is redundant in the plan, eg. an ExchangeSinkExec.output will likely contain the same FieldAttributes of the Project contained in the fragment; when serializing the plan, even if the two lists contain the same objects, they will likely be serialized twice.
This issue is to track the effort to quantify how much memory we are actually consuming for plan building/optimization/coordination
[ ] measure how much memory a logical/physical plan occupies in memory,
[ ] for small queries
[ ] for common queries
[ ] for queries with large schemas
[ ] for very large queries (ie. many commands/expressions)
[x] measure how much memory the serialization of the plan takes, compared to the memory footprint of the plan itself. See how much redundant information we send over the wire (eg. do we serialize the same FieldAttribute multiple times?) https://github.com/elastic/elasticsearch/pull/112008
[ ] check circuit breaker coverage (we know we don't track logical/physical plan memory, we'll have to double-check if/how much we track memory for serialization/deserialization)
[ ] consider special cases that we know have specific memory needs, in particular LOOKUP and INLINESTATS
ES|QL query plans can be pretty large, in some cases they can contain thousands of objects. Eg. the plan of a query like
from *
will likely contain manyFieldAttributes
/EsFields
. The plan fragment will eventually serialized, and most likely the serialized plan will have an even larger footprint.Some information is redundant in the plan, eg. an
ExchangeSinkExec.output
will likely contain the same FieldAttributes of theProject
contained in the fragment; when serializing the plan, even if the two lists contain the same objects, they will likely be serialized twice.This issue is to track the effort to quantify how much memory we are actually consuming for plan building/optimization/coordination