elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.64k stars 24.64k forks source link

ESQL: Conflict between real and synthetic attribute names #110964

Open alex-spies opened 2 months ago

alex-spies commented 2 months ago

In several optimizer/analyzer rules, ESQL creates new attribute names in a way that is meant not to conflict with existing attributes. (The names are created by SubstituteSurrogates.rawTemporaryName.)

There is no mechanism in place that ensures these actually never conflict with real column names.

Conflicts can occur, as seen in this union types example:

curl -u elastic:password -H "Content-Type: application/json" "127.0.0.1:9200/testidx?pretty" -XPUT -d '
{"mappings": {"properties": {"client_ip": {"type":"ip"}}}}'

curl -u elastic:password -H "Content-Type: application/json" "127.0.0.1:9200/testidx2?pretty" -XPUT -d '
{"mappings": {"properties": {"client_ip": {"type":"keyword"}}}}'

from testidx* | eval `$$client_ip$converted_to$ip` = \"1.1.1.1\"::ip | eval x = client_ip::ip | eval y = `$$client_ip$converted_to$ip`

->
"illegal_state_exception","reason":"Found 1 problem\nline 1:70: Plan [Project[[client_ip{f}#80, $$client_ip$converted_to$ip{f}#81 AS x, $$client_ip$converted_to$ip{r}#73 AS y]]] optimized incorrectly due to missing references [$$client_ip$converted_to$ip{f}#81]"}]

from testidx* | eval x = client_ip::ip | eval y = `$$client_ip$converted_to$ip`
}'
   client_ip   |       x       |       y       
---------------+---------------+---------------
null           |127.0.0.1      |127.0.0.1      
null           |127.0.0.2      |127.0.0.2 

-> this one is not wrong, but user using the internal identifiers isn't great.

I see two options to fix this:

  1. Disallow using a specific identifier pattern that is reserved for internal use only (e.g. \$\$.*\$.*\$.* which we use right now); technically a breaking change
  2. Append a unique counter or id (uuid?) to generated names.
  3. Stop relying on names in logical plans and instead rely on attributes' name ids. (Bigger change overall)
elasticsearchmachine commented 2 months ago

Pinging @elastic/es-analytical-engine (Team:Analytics)