elastic / elasticsearch

Free and Open, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.5k stars 24.6k forks source link

Empty scroll contexts don't count #86407

Open nik9000 opened 2 years ago

nik9000 commented 2 years ago

Description

Right now if you have one thousand shards and hit them all with _scroll it'll always bump into the scroll limit, even if most of those shards don't have matching document. It'd be lovely if we could "not count" shards without any data. I don't think we need to keep any state on those shards.

elasticmachine commented 2 years ago

Pinging @elastic/es-search (Team:Search)

ywelsch commented 1 year ago

With PIT being favored over scroll these days, I'm wondering whether this is still worth addressing.

nik9000 commented 1 year ago

If we're actively working to move off of scroll that's probably fine. Do we have similar PIT limits? I do think we'd have to, say, migrate reindex off of scroll and onto PIT before I'd feel good about ignoring this.

hchargois commented 6 months ago

I've noticed a behavior that I think is linked to what is described in this issue. If I'm mistaken, sorry, feel free to disregard or move to a new issue.

When scrolling with slices, it seems that the number of created contexts is num_slices x num_shards. For example if an index has 10 shards and we scroll with 10 slices, then we get 100 open contexts.

This is surprising to me since the documentation says that slices are first distributed among shards; so as long as num_slices <= num_shards I would expect that each slice should only need to keep a context for the shard(s) that it targets and not for any other; so that overall only num_shards contexts are actually useful.

I don't have an actual knowledge of the internals of slices and scroll contexts, so that's mostly an intuition, maybe I'm completely wrong and all the contexts are actually required.

But anyway even if my understand is wrong, the effects are very real. If I have an index with 100 shards and I want to scroll it with 100 slices (as it seems logical to do), then 10k contexts are created on the cluster, and even with 20 nodes that exceeds the default of 500 open contexts/node.

elasticsearchmachine commented 1 month ago

Pinging @elastic/es-search-foundations (Team:Search Foundations)