elastic / rally

Macrobenchmarking framework for Elasticsearch
Apache License 2.0
37 stars 314 forks source link

operation-type:composite fails when track_total_hits is false #1881

Closed ugosan closed 2 weeks ago

ugosan commented 1 month ago

We are using a composite operation to benchmark searchable snapshots using async search like below:

{
  "operation": {
    "operation-type": "composite",
    "name": "date_histogram_by_day-90d-hotfrozen",
    "iterations": 10,
    "requests": [
      {
        "stream": [
          {
            "name": "async-search",
            "operation-type": "submit-async-search",
            "index": "hotfrozenlogs",
            "request-params": {
              "track_total_hits": "false"
            },
            "body": {
              "size": 0,
              "query": {
                "bool": {
                  "filter": [
                    {
                      "range": {
                        "@timestamp": {
                          "gte": "2024-01-01T00:00:00",
                          "lt": "2024-04-01T00:00:00"
                        }
                      }
                    },
                    {
                      "term": {
                        "log.file.path": {
                          "value": "/var/log/messages/salttouch"
                        }
                      }
                    }
                  ]
                }
              },
              "aggs": {
                "time": {
                  "date_histogram": {
                    "field": "@timestamp",
                    "calendar_interval": "1d"
                  }
                }
              }
            }
          }
        ]
      },
      {
        "operation-type": "get-async-search",
        "retrieve-results-for": [
          "async-search"
        ]
      },
      {
        "operation-type": "delete-async-search",
        "delete-results-for": [
          "async-search"
        ]
      }
    ]
  }
}

And its failing with:

esrally.exceptions.RallyError: Cannot run task [date_histogram_by_day-90d-hotfrozen]: Cannot execute [user-defined context-manager enabled runner for [composite]]. Provided parameters are: ['operation-type', 'name', 'iterations', 'requests', 'include-in-reporting']. Error: ['total'].

The log shows GetAsyncSearch is trying to read the total in: https://github.com/elastic/rally/blob/543b4dcf6d602db2e9ef37ddf134a7833b9f530f/esrally/driver/runner.py#L2530

File "/usr/local/lib/python3.8/site-packages/esrally/driver/runner.py", line 2530, in __call__
  "hits": response["response"]["hits"]["total"]["value"],
KeyError: 'total'
ugosan commented 1 month ago

We would need a flag to check wether track_total_hits is present in request-params or in the body (where it can be also defined) and read the total accordingly.

{
    "name": "async-search",
    "operation-type": "submit-async-search",
    "index": "hotfrozenlogs",
    "request-params": {
        "track_total_hits": false //here
    },
    "body": {
        "track_total_hits": false //or here
...
ugosan commented 1 month ago

Here is an agnostic (to track_total_hits) approach that works just by checking if total is in response["response"]["hits"], by splitting the stats[search] in two steps.

class GetAsyncSearch(Runner):
    async def __call__(self, es, params):
        success = True
        searches = mandatory(params, "retrieve-results-for", self)
        request_params = params.get("request-params", {})
        stats = {}
        for search_id, search in async_search_ids(searches):
            response = await es.async_search.get(id=search_id, params=request_params)
            is_running = response["is_running"]
            success = success and not is_running
            if not is_running:

                stats[search] = {
                    "timed_out": response["response"]["timed_out"],
                    "took": response["response"]["took"],
                }

                if "total" in response["response"]["hits"].keys():
                    stats[search]["hits"] = response["response"]["hits"]["total"]["value"]
                    stats[search]["hits_relation"] = response["response"]["hits"]["total"]["relation"]

Another option could be to check in the body and request_params for track_total_hits inside SubmitAsyncSearch and put track_total_hits inside CompositeContext for checking in GetAsyncSearch - more verbose but also much more clear.


class SubmitAsyncSearch(Runner):
    async def __call__(self, es, params):
        ...
        #track_total_hits is true by default
        CompositeContext.put("track_total_hits", True)

        if "track_total_hits" in params.get("body").keys():
            CompositeContext.put("track_total_hits", bool(eval(params.get("body").get("track_total_hits").capitalize())))

        if "track_total_hits" in request_params.keys():
            CompositeContext.put("track_total_hits", bool(eval(request_params.get("track_total_hits").capitalize())))
class GetAsyncSearch(Runner):
...
                if CompositeContext.get("track_total_hits"):
                    stats[search]["hits"] = response["response"]["hits"]["total"]["value"]
                    stats[search]["hits_relation"] = response["response"]["hits"]["total"]["relation"]

@gareth-ellis what do you think?

gareth-ellis commented 1 month ago

I like the agnostic change - it protects us from crashing out if some other parameter were to change the output behaviour too - presuming everything keeps working as expected, that is

ugosan commented 1 month ago

It looks like the tests are also checking for total, I will fix those too.