elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.62k stars 24.64k forks source link

Optimize massive lookups #97947

Open llermaly opened 1 year ago

llermaly commented 1 year ago

Description

Hello,

I have seen many issues about terms query performance, and I see myself in a similar situation now:

I need to filter by a huge number of numeric ids array (millions+), being both payloads big, and queries slow. Ids are coming from an external service so I can not change the logic.

I found some posts from people implementing custom plugins leveraging roaringbitmaps:

https://luis-sena.medium.com/improve-elasticsearch-filtering-performance-10x-using-this-plugin-8c6485516c1a https://medium.com/tinder/how-we-improved-our-performance-using-elasticsearch-plugins-part-2-b051da2ee85b

image

Is this a feature that can be done in elasticsearch to have this performance boosts? or is a custom plugin the only way?

Thanks

elasticsearchmachine commented 1 year ago

Pinging @elastic/es-search (Team:Search)

benwtrent commented 1 year ago

@llermaly to prevent serializing huge lists of numbers over http, there is this query: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-wrapper-query.html

This accepts a query as base64 encoded string, but I am not sure it will be any faster or better than simply zipping the http request via compression headers.

As for a specialized mapping for massive term embeddings, this looks similar: https://github.com/elastic/elasticsearch/pull/94048

It might be good to have a new mapping type, usually these are "numerical id" types. Some thought on how to best expose this needs to be done.

numeric_keyword? numeric_id ? 🤔

llermaly commented 1 year ago

Thanks @benwtrent we need some sort of combination because raw arrays are still bigger to send than the base64 shape.

I got 700ms for 1,000,000 ids array on a 5,000,000 universe before cache 30ms after cache. using the fastfilter plugin

elasticsearchmachine commented 2 months ago

Pinging @elastic/es-search-relevance (Team:Search Relevance)