Open benesch opened 2 years ago
Trying to clarify — would this capability enable a “self destruct” like capability where someone could create a temporary cluster or temporary source? This kind of functionality could have a big impact for go-to-market strategy
would this capability enable a “self destruct” like capability where someone could create a temporary cluster or temporary source?
Yep, it totally could!
Posting some very loose syntax proposals from a recent Slack conversation on this topic:
-- Create a cluster with automatically managed replicas.
CREATE CLUSTER foo REPLICATION FACTOR 2, SIZE 'medium';
-- Size up the cluster. This automatically spins up a new replica
-- at the new size, waits for it to catch up,
-- and then spins down the old replica.
ALTER CLUSTER foo SIZE 'large';
-- Add a new replica automatically.
ALTER CLUSTER foo REPLICATION FACTOR 3;
-- Turn off the cluster for the night.
ALTER CLUSTER foo REPLICATION FACTOR 0;
-- One day...
CREATE CLUSTER blah;
-- ...will create a cluster that automatically scales up and down in
response to workload.
We have two more specific issues that would require dynamic cluster scheduling:
Background
Today, each cluster in Materialize corresponds to a
StatefulSet
in Kubernetes with largely static constraints, like "place this service in this AZ" or "use this CPU and memory limit."This works well for customers who want fine-grained control over their infrastructure. It works less well for customers who don't want that control, and want Materialize to just do the right thing by default.
Proposal
We should create a dynamic cluster scheduler that applies flexible policies. E.g.:
environmentd
to minimize intra-AZ bandwidth costs. This may require moving replicas whenenvironmentd
fails over to a new AZ. (MaterializeInc/cloud#3593)The scheduler needs to effect all these changes without causing downtime. E.g., when moving a replica between AZs, it should spin up a new replica in the new AZ before terminating the old one.
Outstanding work
cc @jseldess