Introduce the ability to pair a WPA object with a DatadogMonitor.
With this first version, if a user enables the option on a per WPA basis using an annotation (see below) they will need to provide a DatadogMonitor object with the same name as their WPA (in the same namespace).
If the DatadogMonitor is not in a OK state, we will not reconcile the given WPA.
Use cases:
The workload is undergoing a rolling update
The metrics provider is in a degraded mode
Workload dependencies are undergoing maintenance or are degraded
Cloud Provider issues (underlying infrastructure is degraded and scaling should be paused)
For now it's fairly basic, but depending on adoption, we could introduce:
Patch the WPA to be in dry-run mode instead of not reconciling at all
List of monitors (or composite monitor, not yet supported in the DatadogMonitor controller)
Priority (maybe some of the activity can be reconciled such as querying the (external) Metrics Server, but not apply)
common DatadogMonitor (in some use cases, all WPAs should be paused).
Motivation
Improving the feature set of the WPA controller.
Additional Notes
This depends on the DatadogMonitor CRD to be installed, I updated the associated helm chart proactively, but we will release 0.7.0 and merge accordingly.
^ this will deploy the DatadogMonitor CRD, if the CRD is already deployed, use datadogCRDs.crds.datadogMonitors=false.
Make sure you have the Operator running (since it reconciles the DatadogMonitor object):
If you do not create a DatadogMonitor named like the WPA (here: example-watermarkpodautoscaler-monitor) in the same ns, the WPA will wait for 2 minutes before reconciling again (time to create a monitor + time to get a relevant status)
If it's created but the state is not OK, we wait 1 minute to reconcile (monitors have a 1min granularity, so it's usless to fetch before that).
The conditions are persisted in the WPA, e.g.:
Last Transition Time: 2024-01-24T20:46:46Z
Message: monitor default/example-watermarkpodautoscaler-monitor is in a OK state, allowing the WPA from proceeding
Reason: DatadogMonitorOK
Status: False
Type: ScalingBlocked
We also log the DatadogMonitors states:
{"level":"info","ts":1706129326438.4067,"logger":"controllers.WatermarkPodAutoscaler","msg":"Lifecycle Control enabled, checking the state of the Datadog Monitor","watermarkpodautoscaler":"default/example-watermarkpodautoscaler-monitor","wpa_name":"example-watermarkpodautoscaler-monitor","wpa_namespace":"default","datadogMonitor":"default/example-watermarkpodautoscaler-monitor"}
{"level":"info","ts":1706129326444.196,"logger":"controllers.WatermarkPodAutoscaler","msg":"Target deploy","watermarkpodautoscaler":"default/example-watermarkpodautoscaler-monitor","wpa_name":"example-watermarkpodautoscaler-monitor","wpa_namespace":"default","replicas":1}
What does this PR do?
Introduce the ability to pair a WPA object with a DatadogMonitor. With this first version, if a user enables the option on a per WPA basis using an annotation (see below) they will need to provide a DatadogMonitor object with the same name as their WPA (in the same namespace). If the DatadogMonitor is not in a OK state, we will not reconcile the given WPA.
Use cases:
For now it's fairly basic, but depending on adoption, we could introduce:
Motivation
Improving the feature set of the WPA controller.
Additional Notes
This depends on the DatadogMonitor CRD to be installed, I updated the associated helm chart proactively, but we will release 0.7.0 and merge accordingly.
Describe your test plan (when 0.7 RC is released)
With the helm chart:
^ this will deploy the DatadogMonitor CRD, if the CRD is already deployed, use
datadogCRDs.crds.datadogMonitors=false
. Make sure you have the Operator running (since it reconciles the DatadogMonitor object):if you are deploying the Agent and Cluster Agent with the Operator, use these options:
if you are deploying with the helm chart:
Create your WPA with the following annotation:
example-watermarkpodautoscaler-monitor
) in the same ns, the WPA will wait for 2 minutes before reconciling again (time to create a monitor + time to get a relevant status)We also log the DatadogMonitors states: