FairwindsOps / goldilocks

Get your resource requests "Just Right"
https://fairwinds.com
Apache License 2.0
2.55k stars 135 forks source link

problem excluding cronjob from automatic VPA creation #708

Closed k11h-de closed 4 months ago

k11h-de commented 4 months ago

What happened?

First off: Thank you a lot for your effort and time in making this very useful tool!

I am having issues excluding some workloads from automatic VPA creation. I have added the proper label to my namespace:

$# kubectl describe ns blueprint-service1
Name:         blueprint-service1
Labels:       goldilocks.fairwinds.com/enabled=true
              toolkit.fluxcd.io/tenant=blueprint
              ...
Annotations:  <none>
Status:       Active

the VPA is created properly for the Deployment that is present in the namespace (last in the list), but also for maintenance cronjobs.

$# kubectl get vpa -n blueprint-service1
NAME                                   MODE   CPU   MEM       PROVIDED   AGE
goldilocks-scale-to-one-service1       Off                    False      80m
goldilocks-scale-to-zero-service1      Off                    False      80m
goldilocks-service1-corn-otc-generic   Off    25m   262144k   True       29h

According to the docs here I tried adding the annotation goldilocks.fairwinds.com/exclude-containers to the CronJob (and embedded Job). This is what my Cronjob manifest looks like:

apiVersion: batch/v1
kind: CronJob
metadata:
  name: scale-to-one-service1
  namespace: blueprint-service1
  labels:
    app.kubernetes.io/managed-by: Helm
    goldilocks.fairwinds.com/exclude-containers: scale-cron-one-service1  # <---
    helm.toolkit.fluxcd.io/name: service1
    helm.toolkit.fluxcd.io/namespace: blueprint-service1
  annotations:
    meta.helm.sh/release-name: service1
    meta.helm.sh/release-namespace: blueprint-service1
spec:
  schedule: 0 6 * * 1-5
  timeZone: Europe/Berlin
  startingDeadlineSeconds: 200
  concurrencyPolicy: Replace
  suspend: false
  jobTemplate:
    metadata:
      labels:
        goldilocks.fairwinds.com/exclude-containers: scale-cron-one-service1  # <---
    spec:
      template:
        metadata:
          labels:
            app.kubernetes.io/instance: cronjob
            goldilocks.fairwinds.com/exclude-containers: scale-cron-one-service1  # <---
        spec:
          containers:
            - name: scale-cron-one-service1
              image: bitnami/kubectl
              command:
                - /bin/sh
                - '-c'
                - >-
                  sleep $(shuf -i 1-10 -n 1) && kubectl scale deployment --replicas 1 service1-corn-otc-generic
              resources:
                requests:
                  cpu: 50m
                  memory: 50Mi
              imagePullPolicy: IfNotPresent
          restartPolicy: OnFailure
  successfulJobsHistoryLimit: 0
  failedJobsHistoryLimit: 0

Any ideas how I can exclude?

Thanks a lot


K8s v1.27.3 - helm v3.14.3 Installed with helm chart version 8.0.1 (app version 4.10.0)

What did you expect to happen?

cronjob is excluded with annotation goldilocks.fairwinds.com/exclude-containers

How can we reproduce this?

apply manifest above

Version

4.10.0

Search

Code of Conduct

Additional context

No response

sudermanjr commented 4 months ago

Try annotating the cronjobs with goldilocks.fairwinds.com/enabled=false. That should disable VPA creation for the entire cronjob. exclude-containers is intended for excluding things like sidecars.

k11h-de commented 4 months ago

Hi @sudermanjr

thanks for your prompt reply!

I tried your suggested annotation with the following manifest:

apiVersion: batch/v1
kind: CronJob
metadata:
  name: scale-to-one-service1
  namespace: blueprint-service1
  labels:
    app.kubernetes.io/managed-by: Helm
    goldilocks.fairwinds.com/enabled: 'false'  # <---
    helm.toolkit.fluxcd.io/name: service1
    helm.toolkit.fluxcd.io/namespace: blueprint-service1
  annotations:
    meta.helm.sh/release-name: service1
    meta.helm.sh/release-namespace: blueprint-service1
spec:
  schedule: 0 6 * * 1-5
  timeZone: Europe/Berlin
  startingDeadlineSeconds: 200
  concurrencyPolicy: Replace
  suspend: false
  jobTemplate:
    metadata:
      labels:
        goldilocks.fairwinds.com/enabled: 'false'  # <---
    spec:
      template:
        metadata:
          labels:
            app.kubernetes.io/instance: cronjob
            goldilocks.fairwinds.com/enabled: 'false'  # <---
        spec:
          containers:
            - name: scale-cron-one-service1
              image: bitnami/kubectl
              command:
                - /bin/sh
                - '-c'
                - >-
                  sleep $(shuf -i 1-10 -n 1) && kubectl scale deployment --replicas 1 service1-corn-otc-generic
              resources:
                requests:
                  cpu: 50m
                  memory: 50Mi
              imagePullPolicy: IfNotPresent
          restartPolicy: OnFailure
  successfulJobsHistoryLimit: 0
  failedJobsHistoryLimit: 0

Just to be sure I added the annotation to the CronJob, the resulting Job end even the pod itself. Unfortunately, the VPAs are still created (even when I delete the existing one before).

This is what the newly created VPA looks like:

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  labels:
    creator: Fairwinds
    source: goldilocks
  name: goldilocks-scale-to-one-service1
  namespace: blueprint-service1
status:
  conditions:
    - lastTransitionTime: '2024-06-12T17:50:25Z'
      message: No pods match this VPA object
      reason: NoPodsMatched
      status: 'True'
      type: NoPodsMatched
    - lastTransitionTime: '2024-06-12T17:49:13Z'
      status: 'True'
      type: RecommendationProvided
  recommendation:
    containerRecommendations:
      - containerName: scale-cron-one-service1
        lowerBound:
          cpu: 25m
          memory: 262144k
        target:
          cpu: 25m
          memory: 262144k
        uncappedTarget:
          cpu: 25m
          memory: 262144k
        upperBound:
          cpu: 25m
          memory: 262144k
spec:
  targetRef:
    apiVersion: batch/v1
    kind: CronJob
    name: scale-to-one-service1
  updatePolicy:
    updateMode: 'Off'

If you have any other ideas, we would be very grateful.

k11h-de commented 4 months ago

possible solution may be in https://github.com/FairwindsOps/goldilocks/pull/710

k11h-de commented 4 months ago

https://github.com/FairwindsOps/goldilocks/pull/710 was merged. Closing this one. Thanks a lot!