elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
1.17k stars 24.84k forks source link

ES|QL Physical plan serialization can be reduced/removed #113809

Open craigtaverner opened 1 month ago

craigtaverner commented 1 month ago

In ES|QL the query plan transmitted to the data nodes (serialized) is a physical plan containing a FragmentExec which in turn contains a Logical Plan. This means that only the higher level nodes in the physical plan need to be serialized. Currently we maintain a lot of serialization code, and unit tests for this code, but it can be considered to be dead code.

This issue was first noticed during the development of pushdown to lucene of sorting by distance. At first we added additional serialization to sorts in the EsQueryExec class, and dealt with the daily merge conflicts from the TransportVersions change. But then we realized that this data was never serialized, so we removed the support for all sorts serialization, but in a way that did not require transport version changes (ie. always serialize an empty list, and when deserializing, ignore the results).

However, on thinking more about this we realize that the entire EsQueryExec class itself might never be serialized. We potentially remove a lot of dead serialization code if many classes are never seriealized. We do need to verify the scope of this, and also take into account the pragma node_level_reduction which turns on and off how much of the plan is handled at the data node versus the coordinator node. It is possible that pragma prevents us from removing any serialization, or removing that pragma.

elasticsearchmachine commented 1 month ago

Pinging @elastic/es-analytical-engine (Team:Analytics)