Open nik9000 opened 4 months ago
Pinging @elastic/es-analytical-engine (Team:Analytics)
Some of https://github.com/elastic/elasticsearch/issues/110923 need to happen before GA of INLINESTATS. Some need to happen after. Some are entirely unrelated.
Maybe we should also consider an optimization, where the output columns of INLINESTATS are actually unused (e.g. DROPped) - then we don't need to perform INLINESTATS/a 2-phase query at all.
Seeing a couple of different failures - i see variants of these with LOOKUP as well, so that might be related. I've noticed that I sometimes can work around them by adding | LIMIT 1000000
right after the LOOKUP or INLINESTATS. also explicitly KEEPing fields sometimes resolves the issue.
FROM .entities*instance*,.alerts*,.slos*
| EVAL _entity_id_type_hosts = CASE(host.name IS NOT NULL, ":hosts", NULL)
| EVAL _entity_id_type_host = host.name
| INLINESTATS _unique_alerts_type_hosts = COUNT_DISTINCT(kibana.alert.uuid) BY _entity_id_type_hosts
| INLINESTATS _unique_alerts_type_host = COUNT_DISTINCT(kibana.alert.uuid) BY _entity_id_type_host
| STATS _alerts_count_hosts = SUM(_unique_alerts_type_hosts) BY entity.id
results in:
```json { "error": { "root_cause": [ { "type": "class_cast_exception", "reason": "class org.elasticsearch.compute.data.LongArrayBlock cannot be cast to class org.elasticsearch.compute.data.BytesRefBlock (org.elasticsearch.compute.data.LongArrayBlock and org.elasticsearch.compute.data.BytesRefBlock are in unnamed module of loader java.net.FactoryURLClassLoader @43120a77)" } ], "type": "class_cast_exception", "reason": "class org.elasticsearch.compute.data.LongArrayBlock cannot be cast to class org.elasticsearch.compute.data.BytesRefBlock (org.elasticsearch.compute.data.LongArrayBlock and org.elasticsearch.compute.data.BytesRefBlock are in unnamed module of loader java.net.FactoryURLClassLoader @43120a77)", "suppressed": [ { "type": "exception", "reason": "1 further exceptions were dropped" }, { "type": "task_cancelled_exception", "reason": "cancelled on failure" } ] }, "status": 500 } ```
Adding a KEEP results in a different error:
FROM .entities*instance*,.alerts*,.slos*
| KEEP host.name, service.name, kibana.alert.uuid, entity.id
| EVAL _entity_id_type_hosts = CASE(host.name IS NOT NULL, ":hosts", NULL)
| EVAL _entity_id_type_host = host.name
| INLINESTATS _unique_alerts_type_hosts = COUNT_DISTINCT(kibana.alert.uuid) BY _entity_id_type_hosts
| INLINESTATS _unique_alerts_type_host = COUNT_DISTINCT(kibana.alert.uuid) BY _entity_id_type_host
| STATS _alerts_count_hosts = SUM(_unique_alerts_type_hosts) BY entity.id
```json { "error": { "root_cause": [ { "type": "array_index_out_of_bounds_exception", "reason": "Index 10 out of bounds for length 8" } ], "type": "array_index_out_of_bounds_exception", "reason": "Index 10 out of bounds for length 8", "suppressed": [ { "type": "exception", "reason": "2 further exceptions were dropped" }, { "type": "task_cancelled_exception", "reason": "cancelled on failure" }, { "type": "task_cancelled_exception", "reason": "parent task was cancelled [cancelled on failure]", "suppressed": [ { "type": "task_cancelled_exception", "reason": "parent task was cancelled [cancelled on failure]" }, { "type": "exception", "reason": "1 further exceptions were dropped" }, { "type": "task_cancelled_exception", "reason": "parent task was cancelled [cancelled on failure]" } ] } ] }, "status": 500 } ```
Description
https://github.com/elastic/elasticsearch/pull/109583 will add support for INLINESTATS, a command to run a STATS and then merge the results into the stream of results. This issue tracks follow up work:
Before GA
// TODO once inlinestats supports expressions in groups we'll likely need the same sort of extraction here
)profile
| INLINESTATS a=AVG(foo) | WHERE foo > a
should be able to push thefoo > a
bit in the second phase. It can't now.brokenWhy-Ignore
,byConstant-Ignored
INLINESTATS x=MAX(a), x=MIN(a)
-shadowingInternal-Ignored
shadowingSelfBySelf
INLINESTATS
in CCS. This includes ensure that the two phase execution model interacts properly with the newly CCS execution info metadata that is gathered: https://github.com/elastic/elasticsearch/pull/112595/files#r1763676152Phased
stuff further into physical planning. It'd be nice to, for example, and aSubqueryExec
plan that runs likePhased
does here. Not sure if physical or logical - but physical feels better. We're doing logical now though.BUCKET
function. Sounds like it doesn't work at the moment.with message
FROM idx | EVAL ip = to_ip(host), x = to_string(host), y = to_string(host) | INLINESTATS max(id)
and re-enable the corresponding telemetry test case muted here.Evantually
INLINESTATS with WHERE filters
, c.f. https://github.com/elastic/elasticsearch/pull/113735