elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.58k stars 24.63k forks source link

Missing percolator highlights with nested docs present #73860

Open jtibshirani opened 3 years ago

jtibshirani commented 3 years ago

If a document has nested documents and it's highlighted against a percolator query, then some highlights may be missing. This bug only seems to happen with term vector highlighting.

Example reproduction:

PUT index
{
  "mappings": {
    "properties": {
      "query": {
        "type": "percolator"
      },
      "title": {
        "type": "text",
        "term_vector": "with_positions_offsets"
      },
      "paragraphs": {
        "type": "nested",
        "properties": {
          "header": {
            "type": "text",
            "term_vector": "with_positions_offsets",
            "analyzer": "default"
          }
        }
      }
    }
  }
}

PUT index/_doc/1
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "title": {
              "query": "our national parks"
            }
          }
        },
        {
          "nested": {
            "path": "paragraphs",
            "query": {
              "match": {
                "paragraphs.header": {
                  "query": "our national parks"
                }
              }
            }
          }
        }
      ]
    }
  }
}

GET index/_search
{
  "query": {
    "percolate": {
      "field": "query",
      "document": {
          "title": "our national parks",
          "paragraphs": {
            "header": "our national parks"
          }
      }
    }
  },
  "highlight": {
    "fields": {
      "title": {}
    }
  }
}
elasticmachine commented 3 years ago

Pinging @elastic/es-search (Team:Search)

jtibshirani commented 3 years ago

My current understanding of the bug: when nested documents are present, we may use the wrong doc ID for the parent document. https://github.com/elastic/elasticsearch/blob/7ad3cdde72dce01e39bd9fcae21ae1f8aeec3839/modules/percolator/src/main/java/org/elasticsearch/percolator/PercolatorHighlightSubFetchPhase.java#L84-L93

mayya-sharipova commented 6 months ago

Related to: https://github.com/elastic/sdh-elasticsearch/issues/4602

elasticsearchmachine commented 2 months ago

Pinging @elastic/es-search-relevance (Team:Search Relevance)