Expose dataflow hydration time

teskje commented 2 months ago

Feature request

This is a request for Materialize to expose the time of the last dataflow hydration to the user. Doing so is desirable for several reasons:

Allowing users to predict the expected time of unavailability during restarts.
Allowing users to find dataflows with slow hydration and optimize them.
Allowing users to determine an appropriate value for the REHYDRATION TIME ESTIMATE cluster scheduling option.

We already expose the hydration status in mz_internal.mz_compute_hydration_statuses, so this would be a good place to also add the hydration time.

This likely requires some changes to how we detect dataflow hydration. Currently this is done by the controller based on frontiers, but the controller is blind to compute reconciliation. So when reconciliation is performed successfully, the controller will report roughly instant hydration, which would be misleading. A way to solve this is to move hydration detection to the replica instead and communicate both the status boolean and the hydration time to the controller.

teskje commented 2 months ago

Potentially blocked by https://github.com/MaterializeInc/materialize/issues/26730

josharenberg commented 1 week ago

Noting that this is being requested for the Pluralsight monitoring dashboard

MaterializeInc / materialize

Expose dataflow hydration time #26776

Feature request