Open astefan opened 2 months ago
Pinging @elastic/es-analytical-engine (Team:Analytics)
Users will eventually need a way to provide their own name (alias) for the qualifier, as the index names can be quite long (.ds-metrics-prometheus.collector-mki-ksm-crd-scraper-2024.08.21-000645 for example). You wouldn't want to have to type that when referring to a column. It's a another nice layer on top of what's proposed above but I think we should think about it now to avoid making it harder later.
What happens if the main index resolves to more than one index? For example, if logs-*
resolves to both logs-1
and logs-2
, what would the columns be in this case?
We have at least two more options here:
LOOKUP benefits ON languages WITH benefits.salary = salary, bonus_percent`
It's the same syntax we have in ENRICH, so in terms of consistency it seems desirable
LOOKUP benefits ON languages PREFIX `benefits.`
I think it's important to point out that any implicit behavior (eg. an implicit prefix as the name of the lookup table) will be a breaking change and won't avoid collisions (eg. employees
could have a field called benefits.salary
)
LOOKUP benefits ON languages WITH benefits.salary = salary, bonus_percent`
It's the same syntax we have in ENRICH, so in terms of consistency it seems desirable
That'd be consistent, but the problem with the WITH
syntax is that it requires explicit aliasing of each and every field. That applies even more so to ENRICH: if the enrich index has many fields, you may accidentally overwrite your fields, or you'll have to list each and every field you want. Qualifiers are one way to solve this, by avoiding name conflicts altogether.
Description
LOOKUP command is a powerful tool that helps users combine static tables of values with "live" data from Elasticsearch. It is consistent with the rest of the language features when it comes to same-name columns in that "the last column" wins.
For example (notice
salary
column that is present inemployees
index andsalary
column that is present in the static tablebenefits
)will result in
salary
column fromemployees
is being replaced by thesalary
column from thebenefits
table. One can argue, though, that both columns are useful and must be kept. There is an workaround here and requires few changes to the query ("manually" copying the old values in a new column):which results in
But we can do better and have these additional steps be added automatically in the form of name qualifiers:
resulting in