databrickslabs / overwatch

Capture deep metrics on one or all assets within a Databricks workspace
Other
230 stars 64 forks source link

Intelligent Scaling - Cluster Resize Call - Wrap inside Future #250

Open GeekSheikh opened 3 years ago

GeekSheikh commented 3 years ago

When Overwatch modules outrun scale-up speeds it's possible that the Overwatch scale-up requests stack up and cause the cluster to get into an unstable state (usually with long-running init-scripts).

Potential solution: Wrap the cluster resize API call in a future and validate status before requesting new size.

Complete with #372

GeekSheikh commented 2 years ago

linking to #372

As part of re-validating intelligent scaling we will look at putting this inside of a controlled future for stability.