alexklibisz / elastiknn

Elasticsearch plugin for nearest neighbor search. Store vectors and run similarity search using exact and approximate algorithms.
https://alexklibisz.github.io/elastiknn
Apache License 2.0
362 stars 48 forks source link

a problem about hybrid search #688

Closed mengrennwpu closed 2 months ago

mengrennwpu commented 2 months ago

Support guidelines

Background

I installed Elasticsearch using Docker.

elasticsearch version: 8.12.2 elasticsearch plugin versions: analysis-ik-8.12.2, elastiknn-8.12.2.1

The details of the created index are as follows:

{
    "settings": {
        "index": {
            "number_of_shards": 2
        }
    },
    "mappings": {
        "dynamic": false,
        "properties": {
            "content": {
                "type": "keyword"
            },
            "type": {
                "type": "keyword"
            },
            "value": {
                "type": "text",
                "analyzer": "ik_max_word",
                "search_analyzer": "ik_smart"
            },
            "embeddings": {
                "type": "elastiknn_dense_float_vector",
                "elastiknn": {
                    "dims": 768,
                    "model": "lsh",
                    "similarity": "cosine",
                    "L": 99,
                    "k": 3
                }
            }
        }
    }
}

the total number of documents is 1230. For example:

{
    "content": "牛角龙",
    "value": "牛角龙",
    "type": "entity",
    "embeddings": [
        0.049,
        -0.02,
        0.049,
        -0.02,
        0.049,
        -0.02,
        0.049,
        -0.02
    ]
}

Bug

1. When I search using only text, it works correctly.

2. When I search using only vector, it also works correctly.

So, where did I go wrong? I'm looking forward to and greatly appreciate your response.

Elastiknn Version

8.12.2.1

Platform

Docker

Steps to reproduce

No response

Additional info

No response

mengrennwpu commented 2 months ago

When I change the model 'lsh' to 'exact' during hybrid search , it works correctly. But as you know, performance will be affected.

alexklibisz commented 2 months ago

Hi @mengrennwpu, there are some known caveats with the function-score query: https://alexklibisz.github.io/elastiknn/api/#using-a-function-score-query

When using "model": "lsh", the "candidates" parameter is ignored and vectors are not re-scored with the exact similarity like they are with a elastiknn_nearest_neighbors query. Instead, the score is: max similarity score * proportion of matching hashes. This is a necessary consequence of the fact that score functions take a doc ID and must immediately return a score.

Perhaps that is happening here?

Also, I'm confused why your example is using size: 10 but only returning 2 results. Are you omitting some results? Or is this the actual complete response?

mengrennwpu commented 2 months ago

@alexklibisz Thank you very much for your response.

The response you referred to regarding the function-score query is correct. And yes, I omitted some results, i'm sorry for causing any misunderstanding.