elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.9k stars 24.73k forks source link

search_as_you_type, prefix queries and broken highligting #53744

Open telendt opened 4 years ago

telendt commented 4 years ago

Elasticsearch version (bin/elasticsearch --version): v7.6.1 (running on Elastic Cloud)

Description of the problem including expected versus actual behavior:

I expect same highlight on results of prefix query, regardless whether it ran on text or search_as_you_type field.

Steps to reproduce:

Setup

(Taken from modules/mapper-extras/src/test/resources/rest-api-spec/test/search-as-you-type/20_highlighting.yml.)

PUT test
{
  "settings": {
    "number_of_replicas": 0
  },
  "mappings": {
    "properties": {
      "a_field": {
        "type": "search_as_you_type",
        "analyzer": "simple",
        "max_shingle_size": 4
      },
      "text_field": {
        "type": "text",
        "analyzer": "simple"
      }
    }
  }
}

POST test/_doc?refresh=true
{
  "a_field": "quick brown fox jump lazy dog",
  "text_field": "quick brown fox jump lazy dog"
}

Highlight on result of prefix search on text field

GET test/_search
{
  "query": {
    "prefix": {
      "text_field": "bro"
    }
  },
  "highlight": {
    "fields": {
      "text_field": {
        "type": "unified"
      }
    }
  }
}

response:

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "test",
        "_type" : "_doc",
        "_id" : "0RJU7nABqHZsIpLuy8i3",
        "_score" : 1.0,
        "_source" : {
          "a_field" : "quick brown fox jump lazy dog",
          "text_field" : "quick brown fox jump lazy dog"
        },
        "highlight" : {
          "text_field" : [
            "quick <em>brown</em> fox jump lazy dog"
          ]
        }
      }
    ]
  }
}

Highlight on result of prefix search on search_as_you_type field

GET test/_search
{
  "query": {
    "prefix": {
      "a_field": "bro"
    }
  },
  "highlight": {
    "fields": {
      "a_field": {
        "type": "unified"
      }
    }
  }
}

Result - no highlight:

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "test",
        "_type" : "_doc",
        "_id" : "0RJU7nABqHZsIpLuy8i3",
        "_score" : 1.0,
        "_source" : {
          "a_field" : "quick brown fox jump lazy dog",
          "text_field" : "quick brown fox jump lazy dog"
        }
      }
    ]
  }
}

Some additional observations

This does not affect only prefix queries, but also "higher level" compound prefix queries, like match_bool_prefix, and multi_match with type=bool_prefix.

\cc @andyb-elastic

elasticmachine commented 4 years ago

Pinging @elastic/es-search (:Search/Highlighting)

telendt commented 4 years ago

@jimczi I'm not familiar with your issues labeling system, but this sounds like a bug to me, not a feature request (if that's what >feature label means).

You mention the optimizations of prefix queries on "search_as_you_type" datatype in your docs, but you don't mention that it affects highlighter in any way - thus I'm assuming it should work the same way as highlighting of prefix queries on text fields: https://www.elastic.co/guide/en/elasticsearch/reference/7.x/search-as-you-type.html#prefix-queries

jimczi commented 4 years ago

I'm not familiar with your issues labeling system, but this sounds like a bug to me, not a feature request (if that's what >feature label means).

Agreed, I changed the label

oeddyo commented 4 years ago

interesting. I can try and take a look

jimczi commented 4 years ago

That's a high hanging fruit which is why I hesitated to mark it as a feature. We don't have the ability to highlight a field that has multiple sub-fields at the moment. I don't mind that you take a look at it @oeddyo but that's more a new feature to implement rather than a small bug fix. This feature is on our roadmap though but the priority is not high to be completely honest.

oeddyo commented 4 years ago

thought it's a simple bug fix... I see

don't worry about it. I can take a look at other tickets

timakro commented 4 years ago

According to the documentation a prefix query on a search_as_you_type field is rewritten to a term query on the ._index_prefix subfield.

When you explicitly make a term query on the ._index_prefix sub-field you get the same behaviour: The query works and gives the same results as the prefix query on the root field but highlighting is missing. Does this contradict @jimczi's assumption that it's a problem with sub-fields?

What's magic about the ._index_prefix sub-field that highlighting isn't working?

tony-hizzle commented 3 years ago

I'm using version 7.10.1. If you use the ._index_prefix field, you can get highlighting, but you get up to max_shingle_size terms highlighted, when I really want single terms highlighted:

GET test/_search
{
  "query": {
    "prefix": {
      "a_field": "bro"
    }
  },
  "highlight": {
    "fields": {
      "a_field._index_prefix": {
        "type": "unified"
      }
    }
  }
}

Result:

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "test",
        "_type" : "_doc",
        "_id" : "eTCJznYBoKM-DZyhqg-S",
        "_score" : 1.0,
        "_source" : {
          "a_field" : "quick brown fox jump lazy dog",
          "text_field" : "quick brown fox jump lazy dog"
        },
        "highlight" : {
          "a_field._index_prefix" : [
            "quick <em>brown fox jump lazy</em> dog"
          ]
        }
      }
    ]
  }
}
seco-mgabor commented 3 years ago

What's the status of this issue? IMHO this is a breaking change and I couldn't find any reference to it in the Elastic documentation. I'm trying to migrate our code base, but this issue is blocker. See this similar issue too: https://github.com/elastic/elasticsearch/issues/70922

peterbe commented 2 years ago

I too found this bug. I get N words highlighted when I want just 1 (if the input query is just 1 word).

To unblock, I only use my regular title field (instead of my title_autocomplete field) which uses a standard text analyzer. Then I query on it with bool_prefix and use title as the highlight config.

I'm curious if others found other workarounds.

telendt commented 2 years ago

Yes, that's the workaround I used in the past.

clemgrim commented 1 year ago

I had a similar issue and the highlight_query seems to work for me:

POST /_search
{
  "query": {
    "prefix": {
      "a_field": "bro"
    }
  },
  "highlight": {
    "fields": {
      "text_field": {
        "highlight_query": {
          "prefix": {"text_field": "bro"}
        }
      }
    }
  }
}
elasticsearchmachine commented 3 months ago

Pinging @elastic/es-search-relevance (Team:Search Relevance)