elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
1.53k stars 24.9k forks source link

Highlighter not working on query_string phrase query when multi term in graph_synonyms list #45486

Open pbvahlst opened 5 years ago

pbvahlst commented 5 years ago

Describe the feature:

Elasticsearch version (bin/elasticsearch --version): 7.3

Plugins installed: [icu]

JVM version (java -version): 9.0.1

OS version (uname -a if on a Unix-like system): Win10

Description of the problem including expected versus actual behavior: Doing a phrase search with query_string e.g. on e.g. "alien resurrection" does not get highligted if graph_synonyms filter is enabled and the the synonym list contains multi term synonyms including one of the terms from the phrase search. In some cases it seems to work partially:

  1. single to single synonyms works fine: "day, today"
  2. single to multi term synonyms works when you search for a single term from the multi term synonym. e.g. If we have the synonym: "face hugger, facehugger, alien" it works if we search for "face" but not if we search for "alien"
  3. multi to multi synonyms works if we search for a single term from the multi term but not if we search for both. E.g. with the synonym: "easter eg, groundhog day" it works when we search for "easter" or "eg" or "groundhog" or "day" but not if we search for "easter eg"

Steps to reproduce:

Please include a minimal but complete recreation of the problem, including (e.g.) index creation, mappings, settings, query etc. The easier you make for us to reproduce it, the more likely that somebody will take the time to look at it.

  1. enable graph_synonyms filter on a search_analyzer and add a multi term synonym for "face hugger, alien"
  2. Do a phrase search e.g. "alien resurection" with fast_vector highlighter enabled
  3. Verify that highlighting is not showing
    1. remove the synonym for the synonym list and do the search again. Verify that highlighting is now working

Provide logs (if relevant):

elasticmachine commented 5 years ago

Pinging @elastic/es-search

henningandersen commented 5 years ago

@slumx thanks for your interest in elasticsearch. I doubt that this issue relates to the JVM version, but nevertheless, it could be good to double check that it reproduces on a supported JVM version.

pbvahlst commented 5 years ago

My bad, it is running 1.8.0_152 (the included one)

jimczi commented 5 years ago

Can you test with the unified highlighter ? It can also use terms_vector so the performance should be similar. The fast_vector highlighter is not actively maintained in Lucene and the unified highlighter was added to replace the old ones so it might be faster for you to switch to this new highlighter rather than waiting for a bug resolution.

pbvahlst commented 5 years ago

Ok I will try that. The reason that I use FVH is that it seems to be the only highlighter which can combine fields analyzed with different analyzers into one field, isn't this still the case? (We use this feature a lot).

jimczi commented 5 years ago

The reason that I use FVH is that it seems to be the only highlighter which can combine fields analyzed with different analyzers into one field, isn't this still the case? (We use this feature a lot).

This is not implemented yet which is the reason why we keep the fast_vector highlighter for now. I'll try to reproduce the bug to see if the fix is simple.

kasa-taku commented 4 years ago

Is there any possibility of supporting this in the future?

mayya-sharipova commented 8 months ago

The issue is still present when in Elasticsearch 8.12 when fvh highlighter is used, but everything works well when unified highlighter is used:

PUT index1
{
  "settings": {
    "analysis": {
      "filter": {
        "my_synonyms_filter": {
          "type": "synonym_graph",
          "synonyms": [
            "face hugger, facehugger, alien",
            "easter eg, groundhog day"
          ]
        }
      },
      "analyzer": {
          "my_analyzer": {
            "tokenizer": "standard",
            "filter": [ "my_synonyms_filter" ]
          }
        }
    }
  },
  "mappings": {
    "properties": {
      "content": {
        "type": "text",
        "search_analyzer": "my_analyzer",
        "term_vector": "with_positions_offsets"
      }
    }
  }
}

POST index1/_bulk?refresh=true
{ "index" : {"_id": 1} }
{ "content" : "face hugger resurrection"}

Using unified highlighter returns the expected results:

{
  "highlight": {
    "fields": {
      "content": {
        "type" : "fvh"
      }
    }
  },
  "query": {
    "match_phrase" : {
      "content": "alien resurrection"
    }
  }
}
 "hits": [
      {
        "_index": "index1",
        "_id": "1",
        "_score": 0.8630463,
        "_source": {
          "content": "face hugger resurrection"
        },
        "highlight": {
          "content": [
            "<em>face</em> <em>hugger</em> <em>resurrection</em>"
          ]
        }
      }
    ]

But when using fvh highlighter, no results are returned:

{
  "highlight": {
    "fields": {
      "content": {
        "type" : "fvh"
      }
    }
  },
  "query": {
    "match_phrase" : {
      "content": "alien resurrection"
    }
  }
}
"hits": [
    {
      "_index": "index1",
      "_id": "1",
      "_score": 0.8630463,
      "_source": {
        "content": "face hugger resurrection"
      }
    }
  ]
elasticsearchmachine commented 4 months ago

Pinging @elastic/es-search-relevance (Team:Search Relevance)