Closed unmarshall closed 1 year ago
I was wondering if this will get implicitly handled by having shoot specific configs for DWD. As then a missing config for a shoot will ignore that shoot from DWD probing. Enhancement to usage story for -- https://github.com/gardener/dependency-watchdog/issues/79
I was wondering if this will get implicitly handled by having shoot specific configs for DWD. As then a missing config for a shoot will ignore that shoot from DWD probing. Enhancement to usage story for -- https://github.com/gardener/dependency-watchdog/issues/79
There are 2 ways to do this:
garden
namespace - which is the case today. In the individual shoot control namespaces, consumers/operators can overwrite one or more configuration parameters by explicitly creating a configuration-override
. The resultant configuration to be used for that shoot will be a merge of default
with configuration-override
thus preventing repeating the same centrally created configuration again and again. Benefit is that if there is a change in a few defaults it will then be uniformly applied to all shoot control namespaces (while leaving the overridden values) without individually also changing configmap for all shoot control namespaces.Few points to consider/ponder:
node-monitor-grace-period
duration thus requiring the need to then also have different probe timeouts. len(workers) == 0
predicate to identify if a probe should be create for a shoot or base it on absence of prober configuration in the shoot control namespace?Therefore it is not final/clear if #79 and this issue will have the same solution.
Considering the "points to ponder", my two cents are:
node-monitor-grace-period
back to 40s
, probably nobody will lower it even more, but even if they do, DWD is probably no longer useful then, because undercutting an even shorter node-monitor-grace-period
is too dangerous as it might scale down the components to aggressively/often. So, I think, we do not have to care about clusters with even lower node-monitor-grace-period
, because we cannot make DWD react even more aggressive without risking detrimental effects. Ergo, no need to do anything here anymore and we can stick with the central configuration.len(workers) == 0
. That's the mechanism that suppresses also the deployment of all these other components like MCM, CA, KSCH, etc., so why shouldn't it be the same trigger/condition that suppresses DWD from acting? I would think, it should, which would make this task much simpler.But the above aside, there is no real need to suppress DWD at all as long as it doesn't try to scale up MCM/CA in a nodeless cluster or fails because their deployments are missing. Why shouldn't it be watching also these control planes or what's the harm for the rest of the functionality (KCM, ETCD<-KAPI)? It just shouldn't fail, but whatever it can do, it can continue to do also for these clusters, no?
Why shouldn't it be watching also these control planes or what's the harm for the rest of the functionality (KCM, ETCD<-KAPI)? It just shouldn't fail, but whatever it can do, it can continue to do also for these clusters, no?
Bringing down KCM when KAPI is unavailable is not really required as there are no nodes and therefore no prevention of a meltdown is required to be one. So if tomorrow we see a LOT of such control planes in a seed then we will unnecessarily create long running go-routines (one per shoot namespace) and they will really not do anything meaningful that will be helpful to the end-user.
How to categorize this issue?
/area control-plane /kind enhancement /priority 3
What would you like to be added:
Gardener Issue#7635 introduces control plane as a service concept where number of workers will be 0 and number of control plane components will also be reduced. MCM and CA will not be deployed and there will also be no need to scale up/down KCM as there is no workload that is scheduled (as there are no nodes).
This enhancement optimises DWD and prevents creation of probes if for a Cluster the number of workers = 0.
Why is this needed:
Currently DWD has a single configuration of dependent services for the prober which is applicable to all shoot control namespaces in the seed. This configuration is contained in a ConfigMap deployed in the
garden
namespace of the seed. For CPAAS (control plane as a service), there will be no deployments created for MCM and CA and there is no need to scale down KCM. Therefore for the prober there is nothing to do for these CPAAS namespaces.