elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.98k stars 24.75k forks source link

[ES|QL] Support grouping sets/filters #114278

Open dgieselaar opened 2 weeks ago

dgieselaar commented 2 weeks ago

Description

As part of our entities effort, we need a way to efficiently create multiple complex groups in a single result set. As an example consider the following definitions:

Services: group by service.name All data streams: group by data_stream.dataset, data_stream.namespace, data_stream.type Production services: group by service.name, filter by service.environment

These definitions generate a list of entities, ie, each row is an entity. However, we also need to be able to query them together - for instance when someone wants to see all the entities owned by their team, or all the entities that are related to another entity.

Right now we are doing two things:

| EVAL entity.id = SPLIT(
    CONCAT(
      // these are pivot definitions
      COALESCE(CONCAT("data_stream", ":", data_stream.type, "/",data_stream.dataset, "/",data_stream.namespace),"__EMPTY__"),
      //  we use __EMPTY__ because any null argument to CONCAT results in a null return value
      COALESCE(CONCAT("service", ":", service.name),"__EMPTY__"),
      ";",
      // if there is no value for container.id, we mark it as empty so we can drop it later
      COALESCE(CONCAT("container", ":", container.id),"__EMPTY__"),
      ";",
      COALESCE(CONCAT("host", ":", host.name),"__EMPTY__"),
      ";",
      // these are filter definitions
      COALESCE(CONCAT("hosts", ":", CASE(host.name IS NOT NULL, "_hosts", NULL)), "__EMPTY__"),
      ";",
      COALESCE(CONCAT("hosts_e2_medium", ":", CASE(host.name IS NOT NULL AND cloud.machine.type == "e2-medium", "hosts_e2_medium", NULL)), "__EMPTY__")
    ),
    ";"
  )

After this, we can aggregate over entity.id to get all the entities (and possibly metadata fields via TOP/LAST/MAX whatever).

Ideally, ES|QL supports this natively. @costin mentioned PostgresQL has grouping sets which is a similar mechanism. I'm not immediately sure from the docs whether that also supports other expressions (such as CASE etc) as part of the grouping set. It would be great if we can have that as well. E.g., the ES|QL query could look like this:

| STATS BY COMPOSITE(
  GROUP("all_data_streams", data_stream.type, data_stream.dataset, data_stream.namespace),
  service.name,
  GROUP("services_in_prod", service.name, service.environment == "production")
)
elasticsearchmachine commented 2 weeks ago

Pinging @elastic/es-analytical-engine (Team:Analytics)