elastic / rally

Macrobenchmarking framework for Elasticsearch
Apache License 2.0
37 stars 314 forks source link

Add 'enrich' operator for setting up enrich indices #1796

Open craigtaverner opened 1 year ago

craigtaverner commented 1 year ago

The recent work to benchmark ES|QL has benefitted greatly from the addition of the esql operator in https://github.com/elastic/rally/pull/1791. However, the benchmarks that make use of the enrich command need enrich indices set up, and this is currently achieved using a complex set of multiple raw operations:

The three steps for delete/create/execute of the enrich policy can be combined into a single enrich policy operation. This operation should combine the following raw steps:

Delete

    {
      "name": "delete-enrich-nyc_vendors",
      "operation-type": "raw-request",
      "ignore": [404],
      "include-in-reporting": false,
      "method": "DELETE",
      "path": "/_enrich/policy/nyc_vendors"
    }

Create

    {
      "name": "create-enrich-nyc_vendors",
      "include-in-reporting": false,
      "operation-type": "raw-request",
      "method": "PUT",
      "path": "/_enrich/policy/nyc_vendors",
      "body": {
        "match": {
          "indices": "nyc_vendors",
          "match_field": "id",
          "enrich_fields": [ "name" ]
        }
      }
    }

Execute

    {
      "name": "execute-enrich-nyc_vendors",
      "operation-type": "raw-request",
      "method": "POST",
      "path": "/_enrich/policy/nyc_vendors/_execute"
    }

Combined

The above operations could be combined into one:

    {
      "name": "setup-enrich-nyc_vendors",
      "operation-type": "enrich_policy",
      "policy_name": "nyc_vendors",
      "delete": true,                              // default to true
      "enrich_type": "match",             // could be default to 'match'
      "indices": "nyc_vendors",          // could be default to value of 'policy_name'
      "match_field": "id",
      "enrich_fields": [ "name" ]
    }

Multiple policies

An interesting alternative might be to allow the definition of multiple policies with a more flexible syntax:

    {
      "name": "setup-enrich-policies",
      "operation-type": "enrich_policy",
      "delete": true,
      "policies": {
        "nyc_vendors": {
            "match": {
              "indices": "nyc_vendors",
              "match_field": "id",
              "enrich_fields": [ "name" ]
          }
        },
        "nyc_payment_types_fares": {
          "match": {
            "indices": "nyc_payment_types",
            "match_field": "type",
            "enrich_fields": [ "name", "fare" ]
          }
        }
      }
    }

One advantage of this syntax is the body of the enrich policy matches exactly the body of the create part of the enrich policy REST request, so we could just pass through that part of the json object.

Joao-antonio-gg commented 2 weeks ago

Can i try this one? I'm begginer in open source community

Bhavya418 commented 1 hour ago

I would like to work on this issue. Can you assign it to me as this will be my first open source contribution.