elastic / elasticsearch

Free and Open, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.43k stars 24.57k forks source link

Rescorer does wrong reorder for tanked hits #75363

Open rudibatt opened 3 years ago

rudibatt commented 3 years ago

Elasticsearch version: 7.13.2 Plugins installed: [elasticsearch-learning-to-rank] JVM version: AdoptOpenJDK (build 16+36) OS version: Linux 82865c3b5df8 5.8.0-59-generic #66~20.04.1-Ubuntu SMP Thu Jun 17 11:14:10 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux (actually that's the official docker image "elasticsearch:7.13.2")

Description: When a rescorer tanks the scores of the documents within the window_size, requests that fetch the results beyond the window_size will always get the same documents.

Cause: The QueryRescorer only reorders the top hits (from 0 to "from+size") see org.elasticsearch.search.rescore.QueryRescorer.combine(TopDocs, TopDocs, QueryRescoreContext) However if the rescorer causes worse scores for the first N, they only get reordered within that top-hits frame.

Steps to reproduce:

PUT my_index/
{
  "mappings": {
    "dynamic_templates": [
      {
        "full_text": {
             "mapping": {
                "type" : "text",
                "analyzer": "whitespace"
             }
        }
      }
    ]
  }
}

POST /_bulk
{"index": {"_index":"my_index", "_id":"1"} }
{ "full_text" : "quick red fox"}
{"index": {"_index":"my_index", "_id":"2"} }
{ "full_text" : "quick green fox"}
{"index": {"_index":"my_index", "_id":"3"} }
{ "full_text" : "quick blue fox"}
{"index": {"_index":"my_index", "_id":"4"} }
{ "full_text" : "lazzy red dog"}
{"index": {"_index":"my_index", "_id":"5"} }
{ "full_text" : "lazzy green dog"}
{"index": {"_index":"my_index", "_id":"6"} }
{ "full_text" : "lazzy blue dog"}
{"index": {"_index":"my_index", "_id":"7"} }
{ "full_text" : "quick red dog"}
{"index": {"_index":"my_index", "_id":"8"} }
{ "full_text" : "quick green dog"}
{"index": {"_index":"my_index", "_id":"9"} }
{ "full_text" : "quick blue dog"}

GET my_index/_search
{
  "from": 4,
  "size": 2, 
  "query": {
    "match": {
      "full_text": "green fox jumps over the blue dog"
    }
  },
  "rescore": {
    "query": {
      "rescore_query": {
          "constant_score": {
            "filter": {
              "term": { "full_text": "quick" }
            },
            "boost": 0.1
          }
        },
        "score_mode": "multiply"
    },
    "window_size": 4
  }
}

For from >= 2 the two results are always the same!

Expected Result Either the whole result is reordered or the reordering only takes place within the window_size. (Must be defined)

Related issue: https://github.com/o19s/elasticsearch-learning-to-rank/issues/369

elasticmachine commented 3 years ago

Pinging @elastic/es-search (Team:Search)

dnhatn commented 3 years ago

Thanks for reporting the issue.

I can reproduce it. The problem is that the requesting documents (i.e., from + size) exceed the window size. Here rescoring reduces the scores of two docs (containing quick) in the top4. And these docs are moved to the bottom after rescored. That explains why these two docs are always returned when from >= 2.

I think we should reject such a request. I opened https://github.com/elastic/elasticsearch/pull/75556.

rudibatt commented 3 years ago

I suggest to only resort the documents within the window size. Then the scores would not be continuous, but the result order would be, even for pages beyond window_size.

elasticsearchmachine commented 1 year ago

Pinging @elastic/es-docs (Team:Docs)

djstrong commented 1 year ago

I agree with @rudibatt

elasticsearchmachine commented 1 month ago

Pinging @elastic/es-search-relevance (Team:Search Relevance)