Open qeqar opened 4 weeks ago
To narrow down on the issue, what happens when you run something along the lines of ytt -f config | kbld -f - | kapp deploy - f - -a <app name>
Can you reproduce this using the CLIs as well?
Tried it with kapp too, same thing.
ytt -f bundle/templates -f bundle/schemas -f bundle/values --data-value k8s_version=v1.31.1 | kbld -f - | kapp deploy --kubeconfig ~/xxx/kubeconfig -c -f - -a gks-cluster-backup.app --yes
Target cluster 'https://xxx:6443' (nodes: provision-test-me-provision-test-me-ix1-md-jwhhm-m78b9, 11+)
resolve | final: xxx/bitnami/etcd:3.5.16 -> xxx/bitnami/etcd@sha256:27d447e33d5788dac3367ee170667ef6a2113f8bf8cfdf8b98308bce6d5894cc
resolve | final: xxx/restic/restic:0.17.1 -> xxx/restic/restic@sha256:424a4e1fcc6fe2557b5614239dc71a2c793acb33a83ea217171bd7edc1862dcb
Changes
Namespace Name Kind Age Op Op st. Wait to Rs Ri
Op: 0 create, 0 delete, 0 update, 0 noop, 0 exists
Wait to: 0 reconcile, 0 delete, 0 noop
Succeeded
But no runs as in cronjob
Definitely suspicious, I am going to try and mock this, but looks like a rebase rule at play. I do not think we have default rebase rules that would do this. However, could you confirm that you do not have any additional rebase ules that are causing this?
Definitely suspicious, I am going to try and mock this, but looks like a rebase rule at play. I do not think we have default rebase rules that would do this. However, could you confirm that you do not have any additional rebase ules that are causing this?
@100mik no custom rebase rules are involved according to @qeqar
@mamachanko is there any other section of custom kapp config that might be at play? is it possible to share?
I can add some more files, but i don't see a place were i changed the rules.
the appCR:
#@ load("@ytt:data", "data")
#@ if not data.values.backup_bucket_name == "myBucket":
---
apiVersion: kappctrl.k14s.io/v1alpha1
kind: App
metadata:
name: gks-cluster-backup
namespace: #@ str(data.values.cluster_namespace)
spec:
paused: false
cluster:
kubeconfigSecretRef:
name: #@ str(data.values.cluster_name) + "-kubeconfig"
key: value
fetch:
- imgpkgBundle:
image: #@ str(data.values.oci_bundle_registry) + "/gks/bundles/gks-cluster-backup-bundle:" + str(data.values.gks_cluster_backup_version)
secretRef:
name: artifactory
template:
- ytt:
paths:
- templates
- schemas
- values
inline:
paths:
#@yaml/text-templated-strings
config/inline.yaml: |
#@data/values
---
access_key_id: "(@= data.values.backup_access_key_id @)"
secret_access_key: "(@= data.values.backup_secret_access_key @)"
restic_password: "(@= data.values.backup_restic_password @)"
default_region: "(@= data.values.backup_region @)"
s3_endpoint: "(@= data.values.backup_s3_endpoint @)"
bucket_name: "(@= data.values.backup_bucket_name @)"
valuesFrom:
- secretRef:
name: #@ str(data.values.cluster_name) + "-kapp-val-k8s-version"
- kbld:
paths:
- '-'
- .imgpkg/images.yml
deploy:
- kapp:
rawOptions: ["--diff-changes=true"]
#@ end
We don't have any special config for the kapp-controller
containers:
- args:
- -packaging-global-namespace=kapp-controller-packaging-global
- -enable-api-priority-and-fairness=True
- -tls-cipher-suites=
And thats all.
Tell me if i should look into more special places.
Tried it with kapp too, same thing.
ytt -f bundle/templates -f bundle/schemas -f bundle/values --data-value k8s_version=v1.31.1 | kbld -f - | kapp deploy --kubeconfig ~/xxx/kubeconfig -c -f - -a gks-cluster-backup.app --yes Target cluster 'https://xxx:6443' (nodes: provision-test-me-provision-test-me-ix1-md-jwhhm-m78b9, 11+) resolve | final: xxx/bitnami/etcd:3.5.16 -> xxx/bitnami/etcd@sha256:27d447e33d5788dac3367ee170667ef6a2113f8bf8cfdf8b98308bce6d5894cc resolve | final: xxx/restic/restic:0.17.1 -> xxx/restic/restic@sha256:424a4e1fcc6fe2557b5614239dc71a2c793acb33a83ea217171bd7edc1862dcb Changes Namespace Name Kind Age Op Op st. Wait to Rs Ri Op: 0 create, 0 delete, 0 update, 0 noop, 0 exists Wait to: 0 reconcile, 0 delete, 0 noop Succeeded
But no runs as in cronjob
@qeqar this either suggests that - as you mentioned - kapp(-controller) is dropping the change or the change is already applied. The latter might be possible, b/c kapp-controller may already have applied it.
Can you assert whether the respective CronJob
on your live cluster is really missing spec.jobTemplate.spec.template.spec.initContainers[0].securityContext.runAsUser
?
Use any tool of choice, but this may lead you right to it I think:
kapp inspect -a gks-cluster-backup.app --filter-kind CronJob --filter-name etcd-backup-restic --raw
I looked more then once in the target cluster, and it only get added when the CronJob resource is newly created.
@qeqar Would you be able to share the output of the following?
ytt -f bundle/templates -f bundle/schemas -f bundle/values --data-value k8s_version=v1.31.1 | kbld -f -
(Just trying to ensure that there's no kapp config that might have seeped in the bundle config.)
@praveenrewar
ytt -f bundle/templates -f bundle/schemas -f bundle/values --data-value k8s_version=v1.31.1 | kbld -f -
resolve | final: xxx/bitnami/etcd:3.5.16 -> xxx/bitnami/etcd@sha256:c1419aec942eae324576cc4ff6c7af20527c8b2e1d25d32144636d8b61dfd986
resolve | final: xxx/restic/restic:0.17.1 -> xxx/restic/restic@sha256:424a4e1fcc6fe2557b5614239dc71a2c793acb33a83ea217171bd7edc1862dcb
---
apiVersion: batch/v1
kind: CronJob
metadata:
annotations:
kbld.k14s.io/images: |
- origins:
- resolved:
tag: 3.5.16
url: xxx/bitnami/etcd:3.5.16
url: xxx/bitnami/etcd@sha256:c1419aec942eae324576cc4ff6c7af20527c8b2e1d25d32144636d8b61dfd986
- origins:
- resolved:
tag: 0.17.1
url: xxx/restic/restic:0.17.1
url: xxx/restic/restic@sha256:424a4e1fcc6fe2557b5614239dc71a2c793acb33a83ea217171bd7edc1862dcb
name: etcd-backup-restic
namespace: kube-system
spec:
concurrencyPolicy: Forbid
failedJobsHistoryLimit: 1
jobTemplate:
spec:
template:
spec:
containers:
- command:
- /bin/sh
- -c
- |-
set -euf
restic snapshots -q || restic init -q
restic backup --tag=etcd --host=${ETCD_HOSTNAME} /backup
restic forget --prune --group-by tag --keep-daily 3 --keep-last 48
env:
- name: ETCD_HOSTNAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: RESTIC_REPOSITORY
value: s3:/
- name: RESTIC_PASSWORD
valueFrom:
secretKeyRef:
key: restic_password
name: s3-restic-credentials
- name: AWS_DEFAULT_REGION
value: ""
- name: AWS_ACCESS_KEY_ID
valueFrom:
secretKeyRef:
key: AWS_ACCESS_KEY_ID
name: s3-restic-credentials
- name: AWS_SECRET_ACCESS_KEY
valueFrom:
secretKeyRef:
key: AWS_SECRET_ACCESS_KEY
name: s3-restic-credentials
image: xxx/restic/restic@sha256:424a4e1fcc6fe2557b5614239dc71a2c793acb33a83ea217171bd7edc1862dcb
imagePullPolicy: IfNotPresent
name: uploader
volumeMounts:
- mountPath: /backup
name: etcd-backup
dnsPolicy: ClusterFirstWithHostNet
hostNetwork: true
initContainers:
- command:
- /bin/sh
- -c
- |-
set -euf
mkdir -p /backup/pki/kubernetes
mkdir -p /backup/pki/etcd
cp -a /etc/kubernetes/pki/etcd/ca.crt /backup/pki/etcd/
cp -a /etc/kubernetes/pki/etcd/ca.key /backup/pki/etcd/
cp -a /etc/kubernetes/pki/ca.crt /backup/pki/kubernetes
cp -a /etc/kubernetes/pki/ca.key /backup/pki/kubernetes
cp -a /etc/kubernetes/pki/front-proxy-ca.crt /backup/pki/kubernetes
cp -a /etc/kubernetes/pki/front-proxy-ca.key /backup/pki/kubernetes
cp -a /etc/kubernetes/pki/sa.key /backup/pki/kubernetes
cp -a /etc/kubernetes/pki/sa.pub /backup/pki/kubernetes
etcdctl snapshot save /backup/etcd-snapshot.db
env:
- name: ETCDCTL_API
value: "3"
- name: ETCDCTL_DIAL_TIMEOUT
value: 3s
- name: ETCDCTL_CACERT
value: /etc/kubernetes/pki/etcd/ca.crt
- name: ETCDCTL_CERT
value: /etc/kubernetes/pki/etcd/healthcheck-client.crt
- name: ETCDCTL_KEY
value: /etc/kubernetes/pki/etcd/healthcheck-client.key
- name: ETCD_HOSTNAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
image: xxx/bitnami/etcd@sha256:c1419aec942eae324576cc4ff6c7af20527c8b2e1d25d32144636d8b61dfd986
imagePullPolicy: IfNotPresent
name: snapshoter
securityContext:
runAsUser: 0
volumeMounts:
- mountPath: /backup
name: etcd-backup
- mountPath: /etc/kubernetes/pki
name: host-pki
readOnly: true
nodeSelector:
node-role.kubernetes.io/control-plane: ""
restartPolicy: OnFailure
tolerations:
- effect: NoSchedule
key: node-role.kubernetes.io/control-plane
operator: Exists
- effect: NoSchedule
key: node-role.kubernetes.io/master
operator: Exists
volumes:
- emptyDir: {}
name: etcd-backup
- hostPath:
path: /etc/kubernetes/pki
name: host-pki
schedule: 0,30 * * * *
successfulJobsHistoryLimit: 0
suspend: false
---
apiVersion: v1
kind: Secret
metadata:
name: s3-restic-credentials
namespace: kube-system
stringData:
AWS_ACCESS_KEY_ID: ""
AWS_SECRET_ACCESS_KEY: ""
restic_password: ""
type: Opaque
Succeeded
@qeqar Sorry, missed the notification, could you also share all the files present in the bundle? Are these two resources that you shared above the only ones in the bundle?
@praveenrewar yes it will render only the cronjob and the secret.
I have the schemal and default.yaml and bundle.yaml
apiVersion: imgpkg.carvel.dev/v1alpha1
kind: Bundle
metadata:
name: gks-cluster-backup-bundle
authors:
- name: GKS
email: mail
websites:
- url: url
Thats all.
And i use these to command to create/upload the bundle:
ytt -f bundle/templates -f bundle/schemas -f bundle/values $(LOCAL_TEST_VALUES) --data-value k8s_version=v$$k8sver ; done | kbld --imgpkg-lock-output bundle/.imgpkg/images.yml -f -
imgpkg push -b ${REPO_URL}/gks/bundles/${BUNDLE_NAME}:v${RP_VERSION_SHORT} -f ./bundle --registry-password="${DEPLOY_PASSWORD}" --registry-username="${DEPLOY_USER}"
that is all!
That is indeed a bit weird, because I am not able to reproduce the issue with a CronJob and I can't think of any other way kapp rebase rules could have been passed. Could you try deleting the App, and then deploying the resource directly using kapp? (i.e. run the following command twice, first without the securityContext and then with it)
ytt -f bundle/templates -f bundle/schemas -f bundle/values --data-value k8s_version=v1.31.1 | kbld -f - | kapp deploy --kubeconfig ~/xxx/kubeconfig -c -f - -a gks-cluster-backup.app --yes
What steps did you take: We install some Cronjobs in remote clusters via the kapp-controller.
Our Setup: oci-bundle -> appCR -> kapp-controller -> remote cluster
What happened: We updates
spec.jobTemplate.spec.template.spec.initContainers[0]
i added .securityContext.runAsUser
But the controller did not find the change, and said no diffs.For testing purpose we did the same with
spec.jobTemplate.spec.template.spec.containers[0]
, which just worked fineWhat did you expect: Cronjob gets updated
Anything else you would like to add: [Additional information that will assist in solving the issue.]
Environment: K8S 1.31.1 Kapp-controller: v0.53.1
That is the used Template for the bundle: