caas-team / py-kube-downscaler

Scale down / "pause" Kubernetes workload (Deployments, StatefulSets, and/or HorizontalPodAutoscalers and CronJobs too !) during non-work hours.
GNU General Public License v3.0
36 stars 14 forks source link

Proper failsafe if EXCLUDED_NAMESPACES could **NOT** be properly parsed. #33

Open larssb opened 6 months ago

larssb commented 6 months ago

Issue

I wish that the kube-downscaler had a safe fall-back reaction in regards to the case where it cannot properly parse a declared EXCLUDED_NAMESPACES env. var. #25 fixes some issues. By generally making the passed Helm Values excluded namespaces Array be properly parsed. However, there could still be cases where parsing fails, values in the EXCLUDED_NAMESPACES are off and so forth.

In these cases it is certainly beneficial that the kube-downscaler doesn't just scale everything down on the cluster where on it runs. That's what happens right now if EXCLUDED_NAMESPACES can't be parsed and one have e.g. configured/declared the DEFAULT_UPTIME env. var.

Problem to solve / Proposal

Code logic into the kube-downscaler so that it rejects an EXCLUDED_NAMESPACES env. var it can't parse and then falls back to dry running or throwing errors into stderr so that some log collector could be such log lines up and one to alert on these further down the line,

samuel-esp commented 3 months ago

I will work on this issue, my proposal is:

  1. we will check the EXCLUDED_NAMESPACES list of regex provided by the user at startup
  2. If we detect a problematic parse we will print a Warning log to help user debugging

We could block the execution entirely if we detect something like this issue #26 but I rather prefer just to inform the user about this potential misconfiguration

Since EXCLUDED_NAMESPACES is a list of regex, it would be difficult to detect misconfiguration other than the one described in issue #26 (which is caused by the fact that namespaces in k8s can't start with a leading space, so it could be a fairly common misconfiguration among other users as well), we'll add for sure more use cases in the future as we identify other potential misconfigurations reported by other users