elastic / elasticsearch

Free and Open, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.52k stars 24.6k forks source link

ANN search improvements #84324

Open jtibshirani opened 2 years ago

jtibshirani commented 2 years ago

This meta issue tracks our work on ANN search. It is not an exhaustive plan and items might be changed/ added over time.

Features

Enhancements

Performance improvements

Search performance

Indexing performance

Tech Debt

elasticmachine commented 2 years ago

Pinging @elastic/es-search (Team:Search)

S-Dragon0302 commented 1 year ago

@jtibshirani https://github.com/elastic/elasticsearch/issues/72068, Is there an online time node or version

benwtrent commented 1 year ago

Support sparse vectors of terms with fractional weights (for approaches like SPLADE)

This is possible with the use of rank_features mapping and a term query with boosting over the features. If performance issues arise here, then we can revisit making a specific mapping.

JakeWinter commented 1 year ago

Is there any documentation on applying SPLADE retrieval using rank_features mapping? @benwtrent

benwtrent commented 1 year ago

@JakeWinter not yet, We will have something documented soon. But rank_features has been around for a while now and it has always worked just fine for this use case. Its just not well advertised.

PUT sparse_index
{
"rank_features_field": {
"feature_0": 1.2,
"feature_14": 0.2,
...
}
}

Then search is:

POST sparse_index/_search
{
  "query": {
    "bool": {
      "min_should_match": 1
      "should": [
      {"rank_feature": {"field": "rank_features_field.feature_0", "linear": {}, "boost": 0.4}},
      {"rank_feature": {"field": "rank_features_field.feature_12", "linear": {}, "boost": 12.1}},
...
      ]
    }
  }
}

boost is the feature score output from our sparse model. linear does similar scoring to the SPLADE++ paper (dot-product of sparse dimensions).

S-Dragon0302 commented 1 year ago

Does ann search support scroll? @jtibshirani

benwtrent commented 1 year ago

@S-Dragon0302 , could you clarify by what you mean by "scroll"? Do you mean find k nearest neighbors, and then on a second call find [k, 2*k] neighbors?

Or that you can use scroll over BM25 combined with the results of the k nearest neighbors?

S-Dragon0302 commented 1 year ago

@S-Dragon0302,你能解释一下你所说的“滚动”是什么意思吗?你的意思是找到k最近的邻居,然后在第二次调用时找到[k, 2*k]邻居?

或者你可以使用 scroll over BM25 结合 k 个最近邻居的结果?

@S-Dragon0302 , could you clarify by what you mean by "scroll"? Do you mean find k nearest neighbors, and then on a second call find [k, 2*k] neighbors?

Or that you can use scroll over BM25 combined with the results of the k nearest neighbors?

https://www.elastic.co/guide/en/elasticsearch/reference/8.5/scroll-api.html

benwtrent commented 1 year ago

@S-Dragon0302 that doesn't really answer my question. What is the behavior you expect from KNN results?

Right now, the original global top k found can be paginated, and so can the query. You can use: https://www.elastic.co/guide/en/elasticsearch/reference/8.5/paginate-search-results.html#search-after

mu4farooqi commented 1 year ago

For anyone planning to use Elasticsearch for Hybrid Search (Sparse and Dense Embeddings), I have written a small tutorial. It covers SPLADE++ as well.

https://ufarooqi.com/blog/hybrid-search-with-elasticsearch/

elasticsearchmachine commented 2 months ago

Pinging @elastic/es-search-relevance (Team:Search Relevance)