elastic / elasticsearch

Free and Open, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.52k stars 24.61k forks source link

Add ability to re-score Top-K query results with a secondary query, #2640

Closed s1monw closed 11 years ago

s1monw commented 11 years ago

Rescore Feature

The rescore feature allows te rescore a document returned by a query based on a secondary algorithm. Rescoring is commonly used if a scoring algorithm is too costly to be executed across the entire document set but efficient enough to be executed on the Top-K documents scored by a faster retrieval method. Rescoring can help to improve precision by reordering a larger Top-K window than actually returned to the user. Typically is it executed on a window between 100 and 500 documents while the actual result window requested by the user remains the same.

Query Rescorer

The query rescorer executes a secondary query only on the Top-K results of the actual user query and rescores the documents based on a linear combination of the user query's score and the score of the rescore_query. This allows to execute any exposed query as a rescore_query and supports a query_weight as well as a rescore_query_weight to weight the factors of the linear combination.

Rescore API

The rescore request is defined along side the query part in the json request:

curl -s -XPOST 'localhost:9200/_search' -d {
  "query" : {
    "match" : {
      "field1" : {
        "query" : "the quick brown",
        "type" : "boolean",
        "operator" : "OR"
      }
    }
  },
  "rescore" : {
    "window_size" : 50,
    "query" : {
      "rescore_query" : {
        "match" : {
          "field1" : {
            "query" : "the quick brown",
            "type" : "phrase",
            "slop" : 2
          }
        }
      },
      "query_weight" : 0.7,
      "rescore_query_weight" : 1.2
    }
  }
}

Each rescore request is executed on a per-shard basis within the same roundtrip. Currently the rescore API has only one implementation (the query rescorer) which modifies the result set in-place. Future developments could include dedicated rescore results if needed by the implemenation ie. a pair-wise reranker. Note: Only regualr queries are rescored, if the search type is set to scan or count rescorers are not executed.

mattweber commented 11 years ago

Any special reason for the rescore_query object? Seems like something along the lines of the following would be more consistent with the api.

curl -s -XPOST 'localhost:9200/_search' -d '{
  "query" : {
    "match" : {
      "field1" : {
        "query" : "the quick brown",
        "type" : "boolean",
        "operator" : "OR"
      }
    }
  },
  "rescore" : {
    "window_size" : 50,
    "query_weight" : 0.7,
    "rescore_query_weight" : 1.2,
    "query" : {
      "match" : {
        "field1" : {
          "query" : "the quick brown",
          "type" : "phrase",
          "slop" : 2
        }
      }
    }
  }
}'
s1monw commented 11 years ago

Hey Matt,

I agree this would be more consistent. The reason behind this is that there might be additional implementations for rescoring in the future that are not necessarily use a query. So the "query" attribute is a context marker here so we can add additional rescorers in the future or open up this api for extension more easily. Does this make sense?

mattweber commented 11 years ago

It does when I look how the parsing is implemented. Maybe just switch query with rescore_query? Doesn't really matter, it just seemed out of place to me when I was looking at it.

curl -s -XPOST 'localhost:9200/_search' -d '{
  "query" : {
    "match" : {
      "field1" : {
        "query" : "the quick brown",
        "type" : "boolean",
        "operator" : "OR"
      }
    }
  },
  "rescore" : {
    "window_size" : 50,
    "rescore_query" : {
      "query" : {
        "match" : {
          "field1" : {
            "query" : "the quick brown",
            "type" : "phrase",
            "slop" : 2
          }
        }
      },
      "query_weight" : 0.7,
      "rescore_query_weight" : 1.2
    }
  }
}'
Kaidanov commented 10 years ago

how can you rescore and sort together ?

xritchie commented 10 years ago

Hi Guys,

I'm not entirely sure if I found a Bug or if the functionality is intended to work like so.

Basically I have a rescore query together with multiple sort fields if I remove the sort functionality the query works as intended and sorting is done on the rescore query, If I use any kind of Sort, the rescore query is totally ignored and sorting is done on the original query score.

To put stuff into perspective the following query is sorted on the original query score and the re score query is totally ignored.

{
    "from": 0,
    "size": 10,
    "explain": false,
    "sort": ["_score", {
        "networks": {
            "order": "desc",
            "mode": "sum"
        }
    }, {
        "rich": {
            "order": "desc",
            "mode": "sum"
        }
    }, {
        "picture": {
            "order": "desc",
            "mode": "sum"
        }
    }],
    "query": {
        "filtered": {
            "query": {
                "bool": {
                    "should": [{
                        "constant_score": {
                            "query": {
                                "match": {
                                    "_all": {
                                        "query": "Daryl"
                                    }
                                }
                            },
                            "boost": 1.0
                        }
                    }, {
                        "constant_score": {
                            "query": {
                                "match": {
                                    "_all": {
                                        "query": "Davies"
                                    }
                                }
                            },
                            "boost": 1.0
                        }
                    }, {
                        "constant_score": {
                            "query": {
                                "match": {
                                    "_all": {
                                        "query": "php"
                                    }
                                }
                            },
                            "boost": 1.0
                        }
                    }, {
                        "constant_score": {
                            "query": {
                                "match": {
                                    "_all": {
                                        "query": "developer"
                                    }
                                }
                            },
                            "boost": 1.0
                        }
                    }],
                    "disable_coord": 1
                }
            },
            "filter": [{
                "or": [{
                    "query": {
                        "match": {
                            "_all": {
                                "query": "Daryl"
                            }
                        }
                    }
                }, {
                    "query": {
                        "match": {
                            "_all": {
                                "query": "Davies"
                            }
                        }
                    }
                }, {
                    "query": {
                        "match": {
                            "_all": {
                                "query": "php"
                            }
                        }
                    }
                }, {
                    "query": {
                        "match": {
                            "_all": {
                                "query": "developer"
                            }
                        }
                    }
                }]
            }]
        }
    },
    "rescore": [{
        "query": {
            "query_weight": 0.0,
            "rescore_query_weight": 1.0,
            "score_mode": "total",
            "rescore_query": {
                "constant_score": {
                    "query": {
                        "match_all": {}
                    },
                    "boost": 20.0
                }
            }
        },
        "window_size": 50
    }]
}
clintongormley commented 10 years ago

@xritchie see https://github.com/elasticsearch/elasticsearch/issues/6788

deepkg commented 7 years ago

Can we provide complex queries for re-scoring? For example : a function for linear sum of attributes with some weights for each attributes. Also, how can be plug the response to be sorted after getting a final score(x_initial pass score + y_re-ranked score)?

lanpay-lulu commented 7 years ago

Will rescore query drop some results that should be returned? In the definition above, I don't thik it should, but actually it can. So I was totally confused. For example, without rescore query, the result number is 100; But with rescore query, the result number drops to 93.