Open llermaly opened 1 year ago
Pinging @elastic/es-search (Team:Search)
@llermaly to prevent serializing huge lists of numbers over http, there is this query: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-wrapper-query.html
This accepts a query as base64
encoded string, but I am not sure it will be any faster or better than simply zipping the http request via compression headers.
As for a specialized mapping for massive term embeddings, this looks similar: https://github.com/elastic/elasticsearch/pull/94048
It might be good to have a new mapping type, usually these are "numerical id" types. Some thought on how to best expose this needs to be done.
numeric_keyword
? numeric_id
? 🤔
Thanks @benwtrent we need some sort of combination because raw arrays are still bigger to send than the base64 shape.
I got 700ms for 1,000,000 ids array on a 5,000,000 universe before cache 30ms after cache. using the fastfilter plugin
Pinging @elastic/es-search-relevance (Team:Search Relevance)
Description
Hello,
I have seen many issues about terms query performance, and I see myself in a similar situation now:
I need to filter by a huge number of numeric ids array (millions+), being both payloads big, and queries slow. Ids are coming from an external service so I can not change the logic.
I found some posts from people implementing custom plugins leveraging roaringbitmaps:
https://luis-sena.medium.com/improve-elasticsearch-filtering-performance-10x-using-this-plugin-8c6485516c1a https://medium.com/tinder/how-we-improved-our-performance-using-elasticsearch-plugins-part-2-b051da2ee85b
Is this a feature that can be done in elasticsearch to have this performance boosts? or is a custom plugin the only way?
Thanks