MaterializeInc / materialize

The data warehouse for operational workloads.
https://materialize.com
Other
5.65k stars 459 forks source link

[Epic] Autoscaling clusters #22568

Open benesch opened 8 months ago

benesch commented 8 months ago

Product outcome

Clusters scale automatically in response to changes in workload.

Background

Today, users must manually choose the initial scale for their cluster (via CREATE CLUSTER ... SIZE) and size it up and down in response to load (via ALTER CLUSTER ... SET (SIZE ...)).

Discovery

The bulk of the discovery work is still TODO. There is some discussion on the syntax and semantics in https://github.com/MaterializeInc/materialize/issues/13870.

The rough design is something like this. When creating a cluster, rather than specifying a fixed SIZE, you can specify instead a minimum size and a maximum size:

CREATE CLUSTER foo REPLICATION FACTOR 3, MIN SIZE 'xsmall', MAX SIZE 'xlarge'

Materialize will then automatically scale the cluster's size within those bounds based on load. The exact definition of "load" is TBD.

Work items

TODO.

See also

Action log

pH14 commented 7 months ago

Misc thoughts to add -- it seems like there are two flavors of autoscaling upwards: preemptive and reactive.

For preemptive, we might see that memory usage is trending up due to natural workload growth, and at some threshold, say 85%, we autoscale by mixing in a sized-up replica and spinning down the old one once the larger one is ready.

For reactive, we might hit an OOM in a replica (potentially a very fast one that spikes within 1 metrics reporting interval so it's not easily visible to us or customers). In this case, the most reliable indication would be the last exit code for the process (137 for OOMKill), and we could use that signal to size the replica up before restarting.

IMO both of these are interesting and worthwhile, and probably share a lot of the same logic, but have different input signals that require different handling. The reactive side may need some new plumbing, potentially through the Orchestrator trait, to expose exit code status if that turns out to be the right signal to watch.

edude03 commented 5 months ago

Fwiw, the Kubernetes Vertical autoscaler would implement the reactive out of the box if we let it. I'm not sure though it'd be well integrated into our product however

jubrad commented 2 days ago

I'm thinking about autoscaling in a post graceful reconfig world. It seems once the WAIT UNTIL CAUGHT UP feature lands two things become possible.

  1. We can use the CREATE CLUSTER foo REPLICATION FACTOR 3, MIN SIZE 'xsmall', MAX SIZE 'xlarge' to create a schedule that looks at the cluster periodically and gracefully resizes it based on introspection data.
  2. A customer could write their own app which subscribe to custom views they write which look at mz_introspection data and spits out a recommended size then issues a ddl gracefully updating the cluster on changes. It would probably want to have some anti-flapping.

The latter is probably possible today with unmanaged clusters.

benesch commented 2 days ago

Agreed on all counts! (1) is exactly how I envision building autoscaling.