elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.98k stars 24.75k forks source link

Add case insensitive option to the aggregation include regex #97435

Open massivespace opened 1 year ago

massivespace commented 1 year ago

Description

We recently updated to make use of the keyword case insensitivity, but we have a use case where we search for a specific set of results filtered by a case insensitive keyword, and we want to then aggregate a nested field from those results, but we only want the aggregated values to contain a case insensitive version of the original search term. We have been using an include([searchterm].*) up to this point, but now the terms that don't match case are not included in the aggregated results. I would like to have a case insensitive option that can be applied to the include clause.

Something like this if the search term was "cw4" but is stored as CW4... in the database.

{
  "query": {
    "nested": {
      "path": "container",
      "query": {
        "prefix": {
          "container.address": {
            "value": "cw4",
            "case_insensitive": true
          }
        }
      }
    }
  },
  "aggs": {
    "by_container": {
      "nested": {
        "path": "container"
      },
      "aggs": {
        "by_address": {
          "terms": {
            "field": "container.address",
            "size": 10,
            "include": {
              "case_insensitive": true,   <---- this is new
              "regexp": "cw4.*"              <---- this is moved from include string into an object
            }
          }
        }
      }
    }
  }
}

The problem is due to the container.address being nested, and one document may contain multiple addresses: CW48tyydd7a, AB659jdf99da

Without the "include" statement, we get the AB6 aggregation entry, but with the "include" present, we don't get the CW4 entry due to the difference in case from the search term.

I would normalize the input as we are about to reindex the database, but in this case, I would need to remove all of the updates we made for offering case sensitivity options to our customers.

I guess another option would be to remove the "include" statement, increase the aggregation "size" value, and manually reject terms that don't match, but I feel like this would only work part of the time still. Our container.address field can contain a lot of values per document.

I'm happy to work around this another way if possible, but I didn't see a way to enable case insensitivity in the "include" settings, or within the regex itself. This feature request just seems like a natural followon to the other case insensitive options provided via keywords and since the regexp itself as a query supports it.

elasticsearchmachine commented 1 year ago

Pinging @elastic/es-analytics-geo (Team:Analytics)

yannours commented 2 weeks ago

This would be very cool tho 👌