elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
1.09k stars 24.83k forks source link

Be more enthusiastic about batching cluster state updates #81715

Open DaveCTurner opened 2 years ago

DaveCTurner commented 2 years ago

Today we only publish batches of cluster state updates that share an executor instance, but we don't formally guarantee that every batch is processed individually by each node: cluster appliers and listeners should be able to deal with pretty much arbitrary batching. For instance, each internal:cluster/coordination/publish_state message is handled after forking to the generic threadpool which means it could be delayed for long enough that the master sends out another update which gets processed first. In theory therefore we should be able to build much bigger batches of cluster state updates.

In practice we almost never process updates out of order today so if we just blindly introduced arbitrary batching we risk finding some bugs in dark corners that are just not exercised today.

@henningandersen and I discussed this earlier and couldn't think of any specific problems but nor were we confident that there are no such problems. We think we should go through all pairs of updates and convince ourselves that it's safe to batch them together. This is likely something that can't be done all at once so we'd like to introduce a mechanism to add cross-executor batching more progressively.

Likely relates https://github.com/elastic/elasticsearch/issues/81626 because it looks hard to add this kind of mechanism on top of the MasterService as implemented today.

We're opening this issue so we have a place to record the specific problems with batching that we might encounter while thinking about other things.

elasticmachine commented 2 years ago

Pinging @elastic/es-distributed (Team:Distributed)

DaveCTurner commented 2 years ago

Since this was opened we've made more updates batchable and the only significant problem we've found has been executors which use the published state to determine how to respond to the caller (#84415). We're working on a better API for batchable updates that will prevent this mistake (#89192).

Keeping this open for now to record the fact that we want to build batches from tasks that span multiple executors, recognising that something like #85751 will make that a lot easier than it is today.