The compute controller is currently mostly implemented as a library whose usage follows a "ready/process" pattern. The coordinator's message loop selects on the ready method, along with other things it waits on. When ready resolves, that signals that the compute controller has work to do, so the coordinator invokes its process method. process then performs any work that has queued up in the compute controller.
process is an async method, which means it can in theory block the coordinator for an unbounded amount of time. In practice we hope that it never blocks for long in async calls, but there is no guarantee. There are also no guarantees about how long the synchronous processing takes. Using time on the coordinator's message handling thread introduces delays in the handling of other messages, including user queries, which can result in degraded responsiveness of the system.
Previously, the compute controller's processing had to be invoked in the coordinators message loop because it required access to a &mut StorageController. With the introduction of StorageCollections, most of the compute controller now does not depend on external mutable state anymore, so it is possible to decouple it from the message loop.
The proposal is to spawn a separate task for each Instance managing a compute cluster. The top-level ComputeController dispatches commands to the different instances through command queues, and each instance continually reads from its queue and executes the provided commands. This will decouple most of the compute controller's processing from the coordinator.
Blockers
This is blocked by https://github.com/MaterializeInc/materialize/issues/24266, the migration of the compute controller to ReadHolds. Having the compute controller run in the background concurrently will make it impossible for the coordinator to rely on read frontiers not advancing during its processing, so we will need to find and resolve all instances where it does so (if there still are any). Having the compute controller communicate its requirements in the form of ReadHold capabilities will make this much easier.
The compute controller is currently mostly implemented as a library whose usage follows a "ready/process" pattern. The coordinator's message loop
select
s on theready
method, along with other things it waits on. Whenready
resolves, that signals that the compute controller has work to do, so the coordinator invokes itsprocess
method.process
then performs any work that has queued up in the compute controller.process
is an async method, which means it can in theory block the coordinator for an unbounded amount of time. In practice we hope that it never blocks for long in async calls, but there is no guarantee. There are also no guarantees about how long the synchronous processing takes. Using time on the coordinator's message handling thread introduces delays in the handling of other messages, including user queries, which can result in degraded responsiveness of the system.Previously, the compute controller's processing had to be invoked in the coordinators message loop because it required access to a
&mut StorageController
. With the introduction ofStorageCollections
, most of the compute controller now does not depend on external mutable state anymore, so it is possible to decouple it from the message loop.The proposal is to spawn a separate task for each
Instance
managing a compute cluster. The top-levelComputeController
dispatches commands to the different instances through command queues, and each instance continually reads from its queue and executes the provided commands. This will decouple most of the compute controller's processing from the coordinator.Blockers
This is blocked by https://github.com/MaterializeInc/materialize/issues/24266, the migration of the compute controller to
ReadHold
s. Having the compute controller run in the background concurrently will make it impossible for the coordinator to rely on read frontiers not advancing during its processing, so we will need to find and resolve all instances where it does so (if there still are any). Having the compute controller communicate its requirements in the form ofReadHold
capabilities will make this much easier.