carvel-dev / kapp-controller

Continuous delivery and package management for Kubernetes.
https://carvel.dev/kapp-controller
Apache License 2.0
271 stars 105 forks source link

Kapp-controller did not find change in cronjob initcontainer #1637

Open qeqar opened 4 weeks ago

qeqar commented 4 weeks ago

What steps did you take: We install some Cronjobs in remote clusters via the kapp-controller.

Our Setup: oci-bundle -> appCR -> kapp-controller -> remote cluster

What happened: We updates spec.jobTemplate.spec.template.spec.initContainers[0] i added .securityContext.runAsUser But the controller did not find the change, and said no diffs.

For testing purpose we did the same with spec.jobTemplate.spec.template.spec.containers[0], which just worked fine

What did you expect: Cronjob gets updated

Anything else you would like to add: [Additional information that will assist in solving the issue.]

Environment: K8S 1.31.1 Kapp-controller: v0.53.1

kbld.k14s.io/images:                                                                                                                                                                                                                                                                         
  - origins:                                                                                                                                                                                                                                                                                 
     - local:                                                                                                                                                                                                                                                                                 
       path: /home/runner/work/kapp-controller/kapp-controller                                                                                                                                                                                                                              
       - git:                                                                                                                                                                                                                                                                                   
          dirty: true                                                                                                                                                                                                                                                                          
          remoteURL: https://github.com/carvel-dev/kapp-controller                                                                                                                                                                                                                             
          sha: 00aa728d6823620c03e3f4917cd565119b17c7d2                                                                                                                                                                                                                                        
          tags:                                                                                                                                                                                                                                                                                
          - v0.53.1                                                                                                                                                                                                                                                                            
      url: ghcr.io/carvel-dev/kapp-controller@sha256:da1ac76b07c0961ec0a1573615cb8c121fd0a4c443a0bb7f73780242d05161f0                  

That is the used Template for the bundle:

#@ load("@ytt:data", "data")
#@ if data.values.k8s_version.startswith("v1.31."):
---
apiVersion: batch/v1
kind: CronJob
metadata:
  name: etcd-backup-restic
  namespace: kube-system
spec:
  concurrencyPolicy: Forbid
  failedJobsHistoryLimit: 1
  schedule: '0,30 * * * *'
  successfulJobsHistoryLimit: 0
  suspend: false
  jobTemplate:
    spec:
      template:
        spec:
          hostNetwork: true
          dnsPolicy: ClusterFirstWithHostNet
          nodeSelector:
            node-role.kubernetes.io/control-plane: ""
          tolerations:
          - key: node-role.kubernetes.io/control-plane
            effect: NoSchedule
            operator: Exists
          - key: node-role.kubernetes.io/master
            effect: NoSchedule
            operator: Exists
          restartPolicy: OnFailure
          volumes:
          - name: etcd-backup
            emptyDir: {}
          - name: host-pki
            hostPath:
              path: /etc/kubernetes/pki
          initContainers:
          - name: snapshoter
            image: #@ data.values.oci_registry_1 + "/bitnami/etcd:3.5.16"
            securityContext:
              runAsUser: 0
            imagePullPolicy: IfNotPresent
            command:
            - /bin/sh
            - -c
            - |-
              set -euf
              mkdir -p /backup/pki/kubernetes
              mkdir -p /backup/pki/etcd
              cp -a /etc/kubernetes/pki/etcd/ca.crt /backup/pki/etcd/
              cp -a /etc/kubernetes/pki/etcd/ca.key /backup/pki/etcd/
              cp -a /etc/kubernetes/pki/ca.crt /backup/pki/kubernetes
              cp -a /etc/kubernetes/pki/ca.key /backup/pki/kubernetes
              cp -a /etc/kubernetes/pki/front-proxy-ca.crt /backup/pki/kubernetes
              cp -a /etc/kubernetes/pki/front-proxy-ca.key /backup/pki/kubernetes
              cp -a /etc/kubernetes/pki/sa.key /backup/pki/kubernetes
              cp -a /etc/kubernetes/pki/sa.pub /backup/pki/kubernetes
              etcdctl snapshot save /backup/etcd-snapshot.db
            env:
            - name: ETCDCTL_API
              value: "3"
            - name: ETCDCTL_DIAL_TIMEOUT
              value: 3s
            - name: ETCDCTL_CACERT
              value: /etc/kubernetes/pki/etcd/ca.crt
            - name: ETCDCTL_CERT
              value: /etc/kubernetes/pki/etcd/healthcheck-client.crt
            - name: ETCDCTL_KEY
              value: /etc/kubernetes/pki/etcd/healthcheck-client.key
            - name: ETCD_HOSTNAME
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
            volumeMounts:
            - mountPath: /backup
              name: etcd-backup
            - mountPath: /etc/kubernetes/pki
              name: host-pki
              readOnly: true
          containers:
          - name: uploader
            image: #@ data.values.oci_registry_2 + "/restic/restic:0.17.1"
            imagePullPolicy: IfNotPresent
            command:
            - /bin/sh
            - -c
            - |-
              set -euf
              restic snapshots -q || restic init -q
              restic backup --tag=etcd --host=${ETCD_HOSTNAME} /backup
              restic forget --prune --group-by tag --keep-daily 3 --keep-last 48
            env:
            - name: ETCD_HOSTNAME
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
            - name: RESTIC_REPOSITORY
              value: #@ "s3:" + str(data.values.s3_endpoint) + "/" + str(data.values.bucket_name)
            - name: RESTIC_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: s3-restic-credentials
                  key: restic_password
            - name: AWS_DEFAULT_REGION
              value: #@ str(data.values.default_region)
            - name: AWS_ACCESS_KEY_ID
              valueFrom:
                secretKeyRef:
                  key: AWS_ACCESS_KEY_ID
                  name: s3-restic-credentials
            - name: AWS_SECRET_ACCESS_KEY
              valueFrom:
                secretKeyRef:
                  key: AWS_SECRET_ACCESS_KEY
                  name: s3-restic-credentials
            volumeMounts:
            - mountPath: /backup
              name: etcd-backup
#@ end
100mik commented 4 weeks ago

To narrow down on the issue, what happens when you run something along the lines of ytt -f config | kbld -f - | kapp deploy - f - -a <app name> Can you reproduce this using the CLIs as well?

qeqar commented 4 weeks ago

Tried it with kapp too, same thing.

ytt -f bundle/templates -f bundle/schemas -f bundle/values --data-value k8s_version=v1.31.1 | kbld -f - | kapp deploy --kubeconfig ~/xxx/kubeconfig -c -f - -a gks-cluster-backup.app --yes
Target cluster 'https://xxx:6443' (nodes: provision-test-me-provision-test-me-ix1-md-jwhhm-m78b9, 11+)
resolve | final: xxx/bitnami/etcd:3.5.16 -> xxx/bitnami/etcd@sha256:27d447e33d5788dac3367ee170667ef6a2113f8bf8cfdf8b98308bce6d5894cc
resolve | final: xxx/restic/restic:0.17.1 -> xxx/restic/restic@sha256:424a4e1fcc6fe2557b5614239dc71a2c793acb33a83ea217171bd7edc1862dcb

Changes

Namespace  Name  Kind  Age  Op  Op st.  Wait to  Rs  Ri  

Op:      0 create, 0 delete, 0 update, 0 noop, 0 exists
Wait to: 0 reconcile, 0 delete, 0 noop

Succeeded

But no runs as in cronjob

100mik commented 4 weeks ago

Definitely suspicious, I am going to try and mock this, but looks like a rebase rule at play. I do not think we have default rebase rules that would do this. However, could you confirm that you do not have any additional rebase ules that are causing this?

mamachanko commented 4 weeks ago

Definitely suspicious, I am going to try and mock this, but looks like a rebase rule at play. I do not think we have default rebase rules that would do this. However, could you confirm that you do not have any additional rebase ules that are causing this?

@100mik no custom rebase rules are involved according to @qeqar

cppforlife commented 4 weeks ago

@mamachanko is there any other section of custom kapp config that might be at play? is it possible to share?

qeqar commented 4 weeks ago

I can add some more files, but i don't see a place were i changed the rules.

the appCR:

#@ load("@ytt:data", "data")
#@ if not data.values.backup_bucket_name == "myBucket":
---
apiVersion: kappctrl.k14s.io/v1alpha1
kind: App
metadata:
  name: gks-cluster-backup
  namespace: #@ str(data.values.cluster_namespace)
spec:
  paused: false
  cluster:
    kubeconfigSecretRef:
      name: #@ str(data.values.cluster_name) + "-kubeconfig"
      key: value
  fetch:
  - imgpkgBundle:
      image: #@ str(data.values.oci_bundle_registry) + "/gks/bundles/gks-cluster-backup-bundle:" + str(data.values.gks_cluster_backup_version)
      secretRef:
          name: artifactory
  template:
    - ytt:
        paths:
          - templates
          - schemas
          - values
        inline:
          paths:
            #@yaml/text-templated-strings
            config/inline.yaml: |
              #@data/values
              ---
              access_key_id: "(@= data.values.backup_access_key_id @)"
              secret_access_key: "(@= data.values.backup_secret_access_key @)"
              restic_password: "(@= data.values.backup_restic_password @)"
              default_region: "(@= data.values.backup_region @)"
              s3_endpoint: "(@= data.values.backup_s3_endpoint @)"
              bucket_name: "(@= data.values.backup_bucket_name @)"
        valuesFrom:
          - secretRef:
              name: #@ str(data.values.cluster_name) + "-kapp-val-k8s-version"
    - kbld:
        paths:
          - '-'
          - .imgpkg/images.yml
  deploy:
    - kapp:
        rawOptions: ["--diff-changes=true"]
#@ end

We don't have any special config for the kapp-controller

      containers:
      - args:
        - -packaging-global-namespace=kapp-controller-packaging-global
        - -enable-api-priority-and-fairness=True
        - -tls-cipher-suites=

And thats all.

Tell me if i should look into more special places.

mamachanko commented 4 weeks ago

Tried it with kapp too, same thing.

ytt -f bundle/templates -f bundle/schemas -f bundle/values --data-value k8s_version=v1.31.1 | kbld -f - | kapp deploy --kubeconfig ~/xxx/kubeconfig -c -f - -a gks-cluster-backup.app --yes
Target cluster 'https://xxx:6443' (nodes: provision-test-me-provision-test-me-ix1-md-jwhhm-m78b9, 11+)
resolve | final: xxx/bitnami/etcd:3.5.16 -> xxx/bitnami/etcd@sha256:27d447e33d5788dac3367ee170667ef6a2113f8bf8cfdf8b98308bce6d5894cc
resolve | final: xxx/restic/restic:0.17.1 -> xxx/restic/restic@sha256:424a4e1fcc6fe2557b5614239dc71a2c793acb33a83ea217171bd7edc1862dcb

Changes

Namespace  Name  Kind  Age  Op  Op st.  Wait to  Rs  Ri  

Op:      0 create, 0 delete, 0 update, 0 noop, 0 exists
Wait to: 0 reconcile, 0 delete, 0 noop

Succeeded

But no runs as in cronjob

@qeqar this either suggests that - as you mentioned - kapp(-controller) is dropping the change or the change is already applied. The latter might be possible, b/c kapp-controller may already have applied it.

Can you assert whether the respective CronJob on your live cluster is really missing spec.jobTemplate.spec.template.spec.initContainers[0].securityContext.runAsUser?

Use any tool of choice, but this may lead you right to it I think:

kapp inspect -a gks-cluster-backup.app --filter-kind CronJob --filter-name etcd-backup-restic --raw
qeqar commented 4 weeks ago

I looked more then once in the target cluster, and it only get added when the CronJob resource is newly created.

praveenrewar commented 1 week ago

@qeqar Would you be able to share the output of the following?

ytt -f bundle/templates -f bundle/schemas -f bundle/values --data-value k8s_version=v1.31.1 | kbld -f -

(Just trying to ensure that there's no kapp config that might have seeped in the bundle config.)

qeqar commented 1 week ago

@praveenrewar


ytt -f bundle/templates -f bundle/schemas -f bundle/values --data-value k8s_version=v1.31.1 | kbld -f -

resolve | final: xxx/bitnami/etcd:3.5.16 -> xxx/bitnami/etcd@sha256:c1419aec942eae324576cc4ff6c7af20527c8b2e1d25d32144636d8b61dfd986
resolve | final: xxx/restic/restic:0.17.1 -> xxx/restic/restic@sha256:424a4e1fcc6fe2557b5614239dc71a2c793acb33a83ea217171bd7edc1862dcb
---
apiVersion: batch/v1
kind: CronJob
metadata:
  annotations:
    kbld.k14s.io/images: |
      - origins:
        - resolved:
            tag: 3.5.16
            url: xxx/bitnami/etcd:3.5.16
        url: xxx/bitnami/etcd@sha256:c1419aec942eae324576cc4ff6c7af20527c8b2e1d25d32144636d8b61dfd986
      - origins:
        - resolved:
            tag: 0.17.1
            url: xxx/restic/restic:0.17.1
        url: xxx/restic/restic@sha256:424a4e1fcc6fe2557b5614239dc71a2c793acb33a83ea217171bd7edc1862dcb
  name: etcd-backup-restic
  namespace: kube-system
spec:
  concurrencyPolicy: Forbid
  failedJobsHistoryLimit: 1
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - command:
            - /bin/sh
            - -c
            - |-
              set -euf
              restic snapshots -q || restic init -q
              restic backup --tag=etcd --host=${ETCD_HOSTNAME} /backup
              restic forget --prune --group-by tag --keep-daily 3 --keep-last 48
            env:
            - name: ETCD_HOSTNAME
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
            - name: RESTIC_REPOSITORY
              value: s3:/
            - name: RESTIC_PASSWORD
              valueFrom:
                secretKeyRef:
                  key: restic_password
                  name: s3-restic-credentials
            - name: AWS_DEFAULT_REGION
              value: ""
            - name: AWS_ACCESS_KEY_ID
              valueFrom:
                secretKeyRef:
                  key: AWS_ACCESS_KEY_ID
                  name: s3-restic-credentials
            - name: AWS_SECRET_ACCESS_KEY
              valueFrom:
                secretKeyRef:
                  key: AWS_SECRET_ACCESS_KEY
                  name: s3-restic-credentials
            image: xxx/restic/restic@sha256:424a4e1fcc6fe2557b5614239dc71a2c793acb33a83ea217171bd7edc1862dcb
            imagePullPolicy: IfNotPresent
            name: uploader
            volumeMounts:
            - mountPath: /backup
              name: etcd-backup
          dnsPolicy: ClusterFirstWithHostNet
          hostNetwork: true
          initContainers:
          - command:
            - /bin/sh
            - -c
            - |-
              set -euf
              mkdir -p /backup/pki/kubernetes
              mkdir -p /backup/pki/etcd
              cp -a /etc/kubernetes/pki/etcd/ca.crt /backup/pki/etcd/
              cp -a /etc/kubernetes/pki/etcd/ca.key /backup/pki/etcd/
              cp -a /etc/kubernetes/pki/ca.crt /backup/pki/kubernetes
              cp -a /etc/kubernetes/pki/ca.key /backup/pki/kubernetes
              cp -a /etc/kubernetes/pki/front-proxy-ca.crt /backup/pki/kubernetes
              cp -a /etc/kubernetes/pki/front-proxy-ca.key /backup/pki/kubernetes
              cp -a /etc/kubernetes/pki/sa.key /backup/pki/kubernetes
              cp -a /etc/kubernetes/pki/sa.pub /backup/pki/kubernetes
              etcdctl snapshot save /backup/etcd-snapshot.db
            env:
            - name: ETCDCTL_API
              value: "3"
            - name: ETCDCTL_DIAL_TIMEOUT
              value: 3s
            - name: ETCDCTL_CACERT
              value: /etc/kubernetes/pki/etcd/ca.crt
            - name: ETCDCTL_CERT
              value: /etc/kubernetes/pki/etcd/healthcheck-client.crt
            - name: ETCDCTL_KEY
              value: /etc/kubernetes/pki/etcd/healthcheck-client.key
            - name: ETCD_HOSTNAME
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
            image: xxx/bitnami/etcd@sha256:c1419aec942eae324576cc4ff6c7af20527c8b2e1d25d32144636d8b61dfd986
            imagePullPolicy: IfNotPresent
            name: snapshoter
            securityContext:
              runAsUser: 0
            volumeMounts:
            - mountPath: /backup
              name: etcd-backup
            - mountPath: /etc/kubernetes/pki
              name: host-pki
              readOnly: true
          nodeSelector:
            node-role.kubernetes.io/control-plane: ""
          restartPolicy: OnFailure
          tolerations:
          - effect: NoSchedule
            key: node-role.kubernetes.io/control-plane
            operator: Exists
          - effect: NoSchedule
            key: node-role.kubernetes.io/master
            operator: Exists
          volumes:
          - emptyDir: {}
            name: etcd-backup
          - hostPath:
              path: /etc/kubernetes/pki
            name: host-pki
  schedule: 0,30 * * * *
  successfulJobsHistoryLimit: 0
  suspend: false
---
apiVersion: v1
kind: Secret
metadata:
  name: s3-restic-credentials
  namespace: kube-system
stringData:
  AWS_ACCESS_KEY_ID: ""
  AWS_SECRET_ACCESS_KEY: ""
  restic_password: ""
type: Opaque

Succeeded
praveenrewar commented 1 week ago

@qeqar Sorry, missed the notification, could you also share all the files present in the bundle? Are these two resources that you shared above the only ones in the bundle?

qeqar commented 4 days ago

@praveenrewar yes it will render only the cronjob and the secret.

I have the schemal and default.yaml and bundle.yaml

apiVersion: imgpkg.carvel.dev/v1alpha1
kind: Bundle
metadata:
  name: gks-cluster-backup-bundle
authors:
- name: GKS
  email: mail
websites:
- url: url

Thats all.

And i use these to command to create/upload the bundle: ytt -f bundle/templates -f bundle/schemas -f bundle/values $(LOCAL_TEST_VALUES) --data-value k8s_version=v$$k8sver ; done | kbld --imgpkg-lock-output bundle/.imgpkg/images.yml -f -

imgpkg push -b ${REPO_URL}/gks/bundles/${BUNDLE_NAME}:v${RP_VERSION_SHORT} -f ./bundle --registry-password="${DEPLOY_PASSWORD}" --registry-username="${DEPLOY_USER}"

that is all!

praveenrewar commented 4 days ago

That is indeed a bit weird, because I am not able to reproduce the issue with a CronJob and I can't think of any other way kapp rebase rules could have been passed. Could you try deleting the App, and then deploying the resource directly using kapp? (i.e. run the following command twice, first without the securityContext and then with it)

ytt -f bundle/templates -f bundle/schemas -f bundle/values --data-value k8s_version=v1.31.1 | kbld -f - | kapp deploy --kubeconfig ~/xxx/kubeconfig -c -f - -a gks-cluster-backup.app --yes