Open jtibshirani opened 2 years ago
Pinging @elastic/es-search (Team:Search)
@jtibshirani https://github.com/elastic/elasticsearch/issues/72068, Is there an online time node or version
Support sparse vectors of terms with fractional weights (for approaches like SPLADE)
This is possible with the use of rank_features
mapping and a term
query with boosting over the features. If performance issues arise here, then we can revisit making a specific mapping.
Is there any documentation on applying SPLADE retrieval using rank_features mapping? @benwtrent
@JakeWinter not yet, We will have something documented soon. But rank_features
has been around for a while now and it has always worked just fine for this use case. Its just not well advertised.
PUT sparse_index
{
"rank_features_field": {
"feature_0": 1.2,
"feature_14": 0.2,
...
}
}
Then search is:
POST sparse_index/_search
{
"query": {
"bool": {
"min_should_match": 1
"should": [
{"rank_feature": {"field": "rank_features_field.feature_0", "linear": {}, "boost": 0.4}},
{"rank_feature": {"field": "rank_features_field.feature_12", "linear": {}, "boost": 12.1}},
...
]
}
}
}
boost
is the feature score output from our sparse model. linear
does similar scoring to the SPLADE++ paper (dot-product of sparse dimensions).
Does ann search support scroll? @jtibshirani
@S-Dragon0302 , could you clarify by what you mean by "scroll"? Do you mean find k
nearest neighbors, and then on a second call find [k, 2*k]
neighbors?
Or that you can use scroll over BM25 combined with the results of the k nearest neighbors?
@S-Dragon0302,你能解释一下你所说的“滚动”是什么意思吗?你的意思是找到
k
最近的邻居,然后在第二次调用时找到[k, 2*k]
邻居?或者你可以使用 scroll over BM25 结合 k 个最近邻居的结果?
@S-Dragon0302 , could you clarify by what you mean by "scroll"? Do you mean find
k
nearest neighbors, and then on a second call find[k, 2*k]
neighbors?Or that you can use scroll over BM25 combined with the results of the k nearest neighbors?
https://www.elastic.co/guide/en/elasticsearch/reference/8.5/scroll-api.html
@S-Dragon0302 that doesn't really answer my question. What is the behavior you expect from KNN
results?
Right now, the original global top k
found can be paginated, and so can the query. You can use: https://www.elastic.co/guide/en/elasticsearch/reference/8.5/paginate-search-results.html#search-after
For anyone planning to use Elasticsearch for Hybrid Search (Sparse and Dense Embeddings), I have written a small tutorial. It covers SPLADE++ as well.
Pinging @elastic/es-search-relevance (Team:Search Relevance)
This meta issue tracks our work on ANN search. It is not an exhaustive plan and items might be changed/ added over time.
Features
Can we remove need for tuning by automatically choosing good index + search parameters (?)Enhancements
Performance improvements
Search performance
Indexing performance
Tech Debt