elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.62k stars 24.64k forks source link

ESQL: Remove EsIndex from plan and serialization #112998

Open alex-spies opened 5 days ago

alex-spies commented 5 days ago

Relates https://github.com/elastic/elasticsearch/issues/111358

When executing an ESQL query, the coordinator node sends logical plans to data nodes. The plan node that corresponds to "fetch data from an index" is EsRelation. It currently contains the whole EsIndex including its complete mapping, which can be quite large when the index has many fields, especially when nested or when there are many type conflicts. (See this test where the serialized size for an EsIndex went up to 20MB for many nested fields.) This means that even queries asking for a single field, like FROM index | KEEP field can have disproportionally large serialized logical plans.

But we don't need the whole EsIndex! All that's needed in EsRelation (and friends: EsSourceExec, EsQueryExec, EsStatsQueryExec) are the index name and the set of concrete index names it represents. (The physical plan nodes don't even seem to need the latter.)

Let's simplify our LogicalPlan nodes to not require the whole EsIndex, and thus shrink down the serialized plan size in case of deeply nested mappings.

elasticsearchmachine commented 5 days ago

Pinging @elastic/es-analytical-engine (Team:Analytics)