elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
1.11k stars 24.83k forks source link

Use max similarity of multiple kNN #112426

Closed sevgiborazan closed 2 months ago

sevgiborazan commented 2 months ago

Description

Hi, I'm using Elastic 8.7.1 version and have a vector index given below

PUT example
{
  "mappings": {
    "properties": {
      "myVector": {
        "type": "dense_vector",
        "dims": 5,
        "index": true,
        "similarity": "cosine"
      }
    }
  }
}
POST example/_bulk?refresh=true
{ "index": { "_id": "1" } }
{ "myVector": [12, 50, -10, 0, 1] }
{ "index": { "_id": "2" } }
{ "myVector": [25, 1, 4, -12, 2] }
{ "index": { "_id": "3" } }
{ "myVector": [1, 5, 25, 50, 20] }

With these index and data, I want to implement multi knn query with multiple vectors. The result i expected should to be sorted for each document according to the max similarity vector among the vectors of the query. I can implement this with script_score like

POST example/_search
{
  "query": {
    "function_score": {
      "functions": [
        {
          "script_score": {
            "script": {
              "source": "double a = cosineSimilarity(params.vector1, 'myVector'); double b = cosineSimilarity(params.vector2, 'myVector'); double c = cosineSimilarity(params.vector3, 'myVector'); return Math.max(a, Math.max(b, c));",
              "params": {
                "vector1": [5, 20, -2, 2, 1],
                "vector2": [3, 20, -4, 3, 10],
                "vector3": [1, 4, 20, 5, 3]
              }
            }
          }
        }
      ]
    }
  }
}

When I try to run approximate knn query, I realized it takes the sum of vector similarities as score.

https://www.elastic.co/guide/en/elasticsearch/reference/8.7/knn-search.html#_search_multiple_knn_fields

Is there a way to implement knn_score_mode: MAX for multi knn search requests?

POST example/_search
{
  "knn": [{
    "field": "myVector",
    "query_vector": [5, 20, -2, 2, 1],
    "k": 5,
    "num_candidates": 10
  },
  {
    "field": "myVector",
    "query_vector": [3, 20, -4, 3, 10],
    "k": 5,
    "num_candidates": 10
  },
  {
    "field": "myVector",
    "query_vector": [1, 4, 20, 5, 3],
    "k": 10,
    "num_candidates": 10
  }],
  "size": 10
}
elasticsearchmachine commented 2 months ago

Pinging @elastic/es-search-relevance (Team:Search Relevance)

benwtrent commented 2 months ago

To do what you want to do specifically, you would have to use a dis_max query: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-dis-max-query.html

GET /_search
{
  "query": {
    "dis_max": {
      "queries": [
        { "knn": { ... } },
        { "knn": { ... } },
        ...
      ],
    }
  }
}

Support for knn as a regular query was added in 8.12