Closed Tradunsky closed 1 year ago
Pinging @elastic/es-search (Team:Search)
Hi @Tradunsky! I was curious if you'd be able to share some of your setup here for how you're doing your measurements? In particular I'm curious if you're using indexed
or non-indexed
vectors for your exact vector search scripts.
We are indeed interested in the performance gains afforded by the jdk.incubator.vector API; however, given the lack of stability of a project in incubation there are significant difficulties in building this into Elasticsearch.
Hi @jdconrad,
Thank you for the replay!
particular I'm curious if you're using indexed or non-indexed vectors for your exact vector search scripts.
Non indexed.
Index definition
"settings": {
"index": {
"number_of_replicas": 0,
"number_of_shards": 7,
}
},
"mappings": {
"properties": {
"entity_id": {"type": "keyword"},
"embedding": {
"type": "dense_vector",
"dims": 768,
"index": False,
}
}
}
Search query
"_source": False,
"query": {
"script_score": {
"query": {
"bool": {
"filter": {
"term": {
"entity_id": entity_id
}
}
}
},
"script": {
"source": "cosineSimilarity(params.query_vector, 'embedding') + 1.0",
"params": {
"query_vector": vector
}
}
}
}
}
Data volumes: 1M records for 1 entity. 1M docs total
Cluster spec:
@Tradunsky Thanks for the info! Non-indexed makes sense to me as we used a different strategy for storage in that case (as I'm guessing you've seen since you've pointed out lines of code! :) ). I will take a look into what we can do here for unrolling to improve performance for this use case.
Thank you @jdconrad much appreciate your time 🙏🏻
I wanted to update this issue as it currently stands:
@grcevski helped me profile the non-indexed path and we came to the conclusion that loop unrolling is already occurring to some extent and that the real bottleneck appears to be unwrapping floats from the binary data used for non-indexed vectors. I will continue to look to see if we can improve performance in some other more creative way.
@jdconrad Thank you very much for looking at this 🙏🏻
I was about to provide an update as well, but did not mean to pull the ticket into a different direction.
While exploring pre-filter with ANN and after profiling the query much carefully, I realized that at least in ANN, distance computation is not the bottleneck (378.8micros), here is the shard profile that takes the longest (for ANN):
I could not get recall not even close to exact KNN, but I probably should first look at the entity filtration strategies I can take as the overall goal is to achieve E2E p50 latency under 100ms for an entity docs count around 500k - 1M. Local implementation showed it is feasible, but data changes scale is the task I was hoping to solve with Elasticsearch.
Just want to verify with you if my judgment about long time for the data filtration phase is fair here as like you mentioned before, it may be affected by a different structure of the index when hnsw graph applied (index:true
)?
Very much appreciate your help and time on this folks 🙏🏻
@Tradunsky I apologize for the delayed response to your question here as I was on holiday for a bit.
I find it interesting that your recall wasn't very good with ANN. I wonder if a higher num_candidates
would help here or is it possible that your prefilter is filtering too many documents? I would be especially curious what a profile for ANN looks like without a filter. None of the profile specifically stands out to me.
For your use cases, I would guess that using an HNSW graph (index:true
) would likely better meet your latency requirements with the caveat that you have enough RAM to dedicate to the HNSW data structure.
Hi @jdconrad, Sorry, somehow missed your replay.
Ok, so by switching from hybrid
prefilter to during
prefilter, I got accuracy jump to the level almost the same as exact KNN. I have not get a chance to dig into the root cause, but document filter result should be exactly about 1M.
Thank you very much for the replays and support! Please feel free to close the ticket as I think ANN and dimensionality reduction are the ways to go in such cases.
@Tradunsky Thanks for the update, and I'm glad you were able to find a solution that works for you! Closing this now.
Linking this here for better visibility: https://github.com/apache/lucene/issues/12091
Great job on incorporating SIMD instructions into 8.9.0! It dropped latencies by at least 50% 💪🏻
Much appreciate work on that folks 🙏🏻
Description
Thank you very much for the diversity of choices when it comes to dense vector search operations 🙏🏻
In my private measures on float32 768 vector dimensions, somehow, Elasticsearch exact KNN cosine similarity function search is slower than in OpenSearch project (apologies if that comparison is inappropriate here, mainly want Elasticsearch to improve to capture NLP cases wave in AWS cloud).
KNN Dense Vector uses Lucene VectorUtil.dot_product calculation for cosine distance function, which seems to take into account one of the possible optimizations 💪🏻
I wonder if cosine and other distance computation functions in Elasticsearch can benefit from java support of vector optimal operations? https://openjdk.org/jeps/417
Thank you!