ResponseError should include `caused_by` in error message

If I run a query that only generates a "mapping mismatch" shard failure there is no error from ES (though the body includes the failures property). If I run a query that only generates a "max bucket" error, the error message surfaced from ResponseError is complete.

However, if I run a query that generates both of these issues, Elasticsearch reports an error that looks like this:

{
  "error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "Field [some.field] of type [keyword] is not supported for aggregation [max]"
      }
    ],
    "type": "search_phase_execution_exception",
    "reason": "",
    "phase": "fetch",
    "grouped": true,
    "failed_shards": [
      {
        "shard": 0,
        "index": "some-index",
        "node": "someNode",
        "reason": {
          "type": "illegal_argument_exception",
          "reason": "Field [some.field] of type [keyword] is not supported for aggregation [max]"
        }
      }
    ],
    "caused_by": {
      "type": "too_many_buckets_exception",
      "reason": "Trying to create too many buckets. Must be less than or equal to: [65536] but this number of buckets was exceeded. This limit can be set by changing the [search.max_buckets] cluster level setting.",
      "max_buckets": 65536
    }
  },
  "status": 400
}

Since the JavaScript client only looks at the root_cause property[1], the actual error (found in caused_by) is hidden from the end user (unless they also log out the meta property of the error).

Speaking to engineers for the other language clients, the general idea is that the clients should aim to encode the full error in the error object and not have opinions on what data to surface. Based on that it seems reasonable to expect the caused_by field to be included in the error message raised by ResponseError.

Side note: One could argue that the root_cause is simply reported wrong from ES in this case which is true though this might not be the only case so including caused_by seems the best solution. Further, engineers on the ES team expressed a concern that fixing this reporting would be non-trivial. In most cases there is a correlation between the caused_by and root_causes fields but in this particular case there isn't and it would likely require some kind of exception list to handle this case.

[1] https://github.com/elastic/elastic-transport-js/blob/90f653e56601fe5bb56e7c41eba0a996f5e8e39c/src/errors.ts#L104

elastic / elastic-transport-js

ResponseError should include `caused_by` in error message #51