argoproj-labs / argocd-image-updater

Automatic container image update for Argo CD
https://argocd-image-updater.readthedocs.io/en/stable/
Apache License 2.0
1.27k stars 260 forks source link

App of apps being overwritten by image-updater #896

Open mehdicopter opened 1 week ago

mehdicopter commented 1 week ago

Screenshot 2024-10-24 at 19 28 26 Screenshot 2024-10-24 at 19 28 48

Describe the bug

I am using ArgoCD with the “App of Apps” pattern. After updating argo-cd-image-updater to version 0.15.0, I encountered an unexpected side effect.

When updating the image of a child application, ArgoCD also updates the parent application (“App of Apps”). This causes a resource conflict, as both the child application (“myapp”) and the parent application (“root”) end up supervising the same resources.

To Reproduce

  1. Use ArgoCD with the “App of Apps” pattern.
  2. Update argo-cd-image-updater to version 0.15.0.
  3. Perform an image update for a child application.
  4. Observe that the parent application also attempts to supervise the same resources.

Expected behavior

Only the child application (“myapp”) should be updated when its image is changed, without the parent application (“root”) taking control over the same resources.

Additional context

This issue was not present with the previous version of argo-cd-image-updater (0.14.x).

Version

{
    "Version": "v2.12.6+4dab5bd",
    "BuildDate": "2024-10-18T17:39:26Z",
    "GitCommit": "4dab5bd6a60adea12e084ad23519e35b710060a2",
    "GitTreeState": "clean",
    "GoVersion": "go1.22.4",
    "Compiler": "gc",
    "Platform": "linux/amd64",
    "KustomizeVersion": "v5.4.2 2024-05-22T15:19:38Z",
    "HelmVersion": "v3.15.2+g1a500d5",
    "KubectlVersion": "v0.29.6",
    "JsonnetVersion": "v0.20.0"
}
chengfang commented 1 week ago

It's not obvious which commits may cause this regression in v0.15.0. https://github.com/argoproj-labs/argocd-image-updater/pull/854 looks a bit suspicious.

Is it possible to filter out the parent app via command line options --match-application-label --match-application-name?

mehdicopter commented 1 week ago

I am trying to use those filters but I am having those errors:

klf argocd-image-updater-5bdb94f977-56hcv
Error: unknown flag: --match-application-label app.company.com/name

What am i doing wrong ? 😬

mehdicopter commented 1 week ago

I am using kustomize to update the deployment of argocd-image-updater

apiVersion: apps/v1
kind: Deployment
metadata:
  name: argocd-image-updater
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: argocd-image-updater
  template:
    spec:
      volumes:
        - name: scripts
          configMap:
            name: argocd-image-updater-scripts
            defaultMode: 0777
      containers:
        - name: argocd-image-updater
          args:
            - run
            - --match-application-label app.company.com/name=myapp
          volumeMounts:
            - name: scripts
              mountPath: /scripts
jannfis commented 1 week ago

In your snippet below, the command line switch and its parameter are being passed as a single argument.

To fix it, you can either use

          args:
            - run
            - --match-application-label=app.company.com/name=myapp

(note the equal sign between the parameter and the value) or

          args:
            - run
            - --match-application-label
            - app.company.com/name=myapp
mehdicopter commented 1 week ago

Even with the matching label it does the same... look at the screenshot.

Screenshot 2024-10-25 at 00 43 33
mehdicopter commented 1 week ago

The root app-of-apps is behaving like being the app, resulting of having 2 argo apps responsible to handle resources, which causes SharedResourceWarning

mehdicopter commented 1 week ago

According to the logs, it does not even update the root application.

time="2024-10-24T22:54:12Z" level=debug msg="Applications listed: 12"
time="2024-10-24T22:54:12Z" level=info msg="Starting image update cycle, considering 1 annotated application(s) for update"
time="2024-10-24T22:54:12Z" level=debug msg="Processing application argocd/myapp-staging"
time="2024-10-24T22:54:12Z" level=debug msg="Considering this image for update" alias=adserver application=myapp-staging image_name=xxxx/oci-xxx/adserver image_tag="sha256:1a15f767519b1ce8d73130cc7b6e6a8787c12482f3a61dbe4c6d7bbc5d5d6c27" registry=europe-west1-docker.pkg.dev
time="2024-10-24T22:54:12Z" level=debug msg="Using version constraint 'staging' when looking for a new tag" alias=adserver application=myapp-staging image_name=xxxx/oci-xxx/adserver image_tag="sha256:1a15f767519b1ce8d73130cc7b6e6a8787c12482f3a61dbe4c6d7bbc5d5d6c27" registry=europe-west1-docker.pkg.dev
time="2024-10-24T22:54:13Z" level=debug msg="found 1 from 1 tags eligible for consideration" image="europe-west1-docker.pkg.dev/xxxx/oci-xxx/adserver@sha256:1a15f767519b1ce8d73130cc7b6e6a8787c12482f3a61dbe4c6d7bbc5d5d6c27"
time="2024-10-24T22:54:13Z" level=info msg="Setting new image to europe-west1-docker.pkg.dev/xxxx/oci-xxx/adserver:staging@sha256:63f7bebb86c43d8a1a71a8394f6f576731acf08a97dd0279a830d4bba8406c36" alias=adserver application=myapp-staging image_name=xxxx/oci-xxx/adserver image_tag="sha256:1a15f767519b1ce8d73130cc7b6e6a8787c12482f3a61dbe4c6d7bbc5d5d6c27" registry=europe-west1-docker.pkg.dev
time="2024-10-24T22:54:13Z" level=info msg="Successfully updated image 'europe-west1-docker.pkg.dev/xxxx/oci-xxx/adserver@sha256:1a15f767519b1ce8d73130cc7b6e6a8787c12482f3a61dbe4c6d7bbc5d5d6c27' to 'europe-west1-docker.pkg.dev/xxxx/oci-xxx/adserver:staging@sha256:63f7bebb86c43d8a1a71a8394f6f576731acf08a97dd0279a830d4bba8406c36', but pending spec update (dry run=false)" alias=adserver application=myapp-staging image_name=xxxx/oci-xxx/adserver image_tag="sha256:1a15f767519b1ce8d73130cc7b6e6a8787c12482f3a61dbe4c6d7bbc5d5d6c27" registry=europe-west1-docker.pkg.dev
time="2024-10-24T22:54:13Z" level=debug msg="Using commit message: "
time="2024-10-24T22:54:13Z" level=info msg="Committing 1 parameter update(s) for application myapp-staging" application=myapp-staging
time="2024-10-24T22:54:13Z" level=debug msg="Getting application myapp-staging across all namespaces"
time="2024-10-24T22:54:13Z" level=debug msg="Applications listed: 39"
time="2024-10-24T22:54:13Z" level=debug msg="Found application: argo-cd in namespace argocd"
time="2024-10-24T22:54:13Z" level=debug msg="Found application: autopilot-bootstrap in namespace argocd"
time="2024-10-24T22:54:13Z" level=debug msg="Found application: blackbox-exporter in namespace argocd"
time="2024-10-24T22:54:13Z" level=debug msg="Found application: cluster-resources-in-cluster in namespace argocd"
time="2024-10-24T22:54:13Z" level=debug msg="Found application: xxxx-ui-back-dev in namespace argocd"
time="2024-10-24T22:54:13Z" level=debug msg="Found application: xxxx-ui-front-dev in namespace argocd"
time="2024-10-24T22:54:13Z" level=debug msg="Found application: external-dns-xxxx-dev in namespace argocd"
time="2024-10-24T22:54:13Z" level=debug msg="Found application: external-dns-in-cluster in namespace argocd"
time="2024-10-24T22:54:13Z" level=debug msg="Found application: external-dns-xxx-preprod in namespace argocd"
time="2024-10-24T22:54:13Z" level=debug msg="Found application: external-dns-xxx-prod in namespace argocd"
time="2024-10-24T22:54:13Z" level=debug msg="Found application: external-dns-xxx-staging in namespace argocd"
time="2024-10-24T22:54:13Z" level=debug msg="Found application: external-dns-xxxx-prod in namespace argocd"
time="2024-10-24T22:54:13Z" level=debug msg="Found application: external-dns-xxxx-stng in namespace argocd"
time="2024-10-24T22:54:13Z" level=debug msg="Found application: external-dns-xxxx-test in namespace argocd"
time="2024-10-24T22:54:13Z" level=debug msg="Found application: external-secrets-xxxx-dev in namespace argocd"
time="2024-10-24T22:54:13Z" level=debug msg="Found application: external-secrets-gitlab-runner in namespace argocd"
time="2024-10-24T22:54:13Z" level=debug msg="Found application: external-secrets-in-cluster in namespace argocd"
time="2024-10-24T22:54:13Z" level=debug msg="Found application: external-secrets-xxx-preprod in namespace argocd"
time="2024-10-24T22:54:13Z" level=debug msg="Found application: external-secrets-xxx-prod in namespace argocd"
time="2024-10-24T22:54:13Z" level=debug msg="Found application: external-secrets-xxx-staging in namespace argocd"
time="2024-10-24T22:54:13Z" level=debug msg="Found application: external-secrets-xxxx-prod in namespace argocd"
time="2024-10-24T22:54:13Z" level=debug msg="Found application: external-secrets-xxxx-stng in namespace argocd"
time="2024-10-24T22:54:13Z" level=debug msg="Found application: external-secrets-xxxx-test in namespace argocd"
time="2024-10-24T22:54:13Z" level=debug msg="Found application: gitlab-runner in namespace argocd"
time="2024-10-24T22:54:13Z" level=debug msg="Found application: grafana in namespace argocd"
time="2024-10-24T22:54:13Z" level=debug msg="Found application: prometheus in namespace argocd"
time="2024-10-24T22:54:13Z" level=debug msg="Found application: myapp-preprod in namespace argocd"
time="2024-10-24T22:54:13Z" level=debug msg="Found application: myapp-prod in namespace argocd"
time="2024-10-24T22:54:13Z" level=debug msg="Found application: myapp-staging in namespace argocd"
time="2024-10-24T22:54:13Z" level=debug msg="Application myapp-staging matches the pattern"
time="2024-10-24T22:54:13Z" level=debug msg="Found application: xxx-backend-preprod in namespace argocd"
time="2024-10-24T22:54:13Z" level=debug msg="Found application: xxx-backend-prod in namespace argocd"
time="2024-10-24T22:54:13Z" level=debug msg="Found application: xxx-backend-staging in namespace argocd"
time="2024-10-24T22:54:13Z" level=debug msg="Found application: xxx-frontend-preprod in namespace argocd"
time="2024-10-24T22:54:13Z" level=debug msg="Found application: xxx-frontend-prod in namespace argocd"
time="2024-10-24T22:54:13Z" level=debug msg="Found application: xxx-frontend-staging in namespace argocd"
time="2024-10-24T22:54:13Z" level=debug msg="Found application: xxx-fusionauth-preprod in namespace argocd"
time="2024-10-24T22:54:13Z" level=debug msg="Found application: xxx-fusionauth-prod in namespace argocd"
time="2024-10-24T22:54:13Z" level=debug msg="Found application: xxx-fusionauth-staging in namespace argocd"
time="2024-10-24T22:54:13Z" level=debug msg="Found application: root in namespace argocd"
time="2024-10-24T22:54:13Z" level=info msg="Successfully updated the live application spec" application=myapp-staging
time="2024-10-24T22:54:13Z" level=info msg="Processing results: applications=1 images_considered=1 images_skipped=0 images_updated=1 errors=0"
mehdicopter commented 3 days ago

Does the logs help you @chengfang ?

jannfis commented 3 days ago

What are the SharedResourceWarning's details?

mehdicopter commented 3 days ago

What are the SharedResourceWarning's details?

time="2024-10-24T02:59:51Z" level=info msg="Normalized app spec: {\"status\":{\"conditions\":[{\"lastTransitionTime\":\"2024-10-24T02:59:50Z\",\"message\":\"ConfigMap/datawiz-ui-front is part of applications argocd/datawiz-ui-front-dev and root\",\"type\":\"SharedResourceWarning\"},{\"lastTransitionTime\":\"2024-10-24T02:59:50Z\",\"message\":\"Deployment/datawiz-ui-front is part of applications argocd/datawiz-ui-front-dev and root\",\"type\":\"SharedResourceWarning\"},{\"lastTransitionTime\":\"2024-10-24T02:59:50Z\",\"message\":\"Service/datawiz-ui-front is part of applications argocd/datawiz-ui-front-dev and root\",\"type\":\"SharedResourceWarning\"}]}}" app-namespace=argocd app-qualified-name=argocd/datawiz-ui-front-dev application=datawiz-ui-front-dev project=datawiz
jannfis commented 3 days ago

At this point, I highly doubt that this has to do with the Image Updater. Image Updater itself doesn't manage or manipulate resources such as ConfigMaps, or other types.

Are you using a mono repo for all your apps, including the root app by any chance?

mehdicopter commented 3 days ago

At this point, I highly doubt that this has to do with the Image Updater. Image Updater itself doesn't manage or manipulate resources such as ConfigMaps, or other types.

Are you using a mono repo for all your apps, including the root app by any chance?

Ok I understand. But explain why it is working in 0.14.0 ? I am using the same cluster, same repo which contains all ArgoCD apps.

jannfis commented 3 days ago

Can you post your root application's spec here?

mehdicopter commented 3 days ago

Can you post your root application's spec here?

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  finalizers:
    - resources-finalizer.argocd.argoproj.io
  labels:
    app.kubernetes.io/managed-by: argocd-autopilot
    app.kubernetes.io/name: root
  name: root
  namespace: argocd
spec:
  destination:
    namespace: argocd
    server: https://kubernetes.default.svc
  ignoreDifferences:
    - group: argoproj.io
      jsonPointers:
        - /status
      kind: Application
  project: default
  source:
    path: projects
    repoURL: https://gitlab.com/xxx/argocd.git
  syncPolicy:
    automated:
      allowEmpty: true
      prune: true
      selfHeal: true
    syncOptions:
      - allowEmpty=true
status:
  health: {}
  summary: {}
  sync:
    comparedTo:
      destination: {}
      source:
        repoURL: ""
    status: ""
jannfis commented 3 days ago

I assume that is the version stored in Git, I am more interested in the live resource in the cluster.

I suspect something (maybe image updater) might have fiddled with the source block there.

mehdicopter commented 3 days ago

I assume that is the version stored in Git, I am more interested in the live resource in the cluster.

I suspect something (maybe image updater) might have fiddled with the source block there.

I'll update to 0.15.0 and try to get the live version. I say "try" because when I update, the yaml become the same as another app, so it will be difficult to get the live of the root app itself.

LGLN-LS commented 1 day ago

I have encountered a similar problem. But I am not using the "App of Apps" pattern.

My setup looks like this. I deploy multiple nginx servers with different Docker images/tags in different kubernetes namespaces. They all use the same helm chart for deployment, but with different values.yaml. Each application has its own ArgoCD application resource, which is applied manually to kubernetes.

Application Dockerimage Kubernetes Namespace
App-A-Development app-a:dev app-a-dev
App-A-Staging app-a:staging app-a-staging
App-B-Development app-b:dev app-b-dev
App-B-Staging app-b:staging app-b-staging

With argocd-image-updater v0.14.0 everything works as indented.

After I updated argocd-image-updater to v0.15.0 something strange happened. Our monitoring issued an alert because no more metrics could be collected from App-B-Staging. I started to investigate and noticed that the namespace app-b-staging was empty. The deployment was gone. Then I checked argocd and I got a SharedResourceWarning for App-B-Staging.

ArgoCD was trying to deploy App-B-Staging to the app-b-dev namespace with the app-b:dev Dockerimage. Somehow the configurations have to be mixed up? I tried to delete and reinstall App-B-Staging and it worked. But then App-A-Staging had the exact same issue!

After that, I downgraded argocd-image-updater back to v0.14.0, reapplied the ArgoCD application resources, and everything worked as expected. I hope you can understand, and this helps to find the issue.

JeromeMSD commented 1 day ago

Same issue for a week with the v0.15.0 version. It breaks one app in a set of 125. Application specs of some apps that use argocd-image-updater appear to drift to other applications that also use the argocd-image-updater annotations. Deleting the broken app diverts the configuration to another one.

I tried hard refresh, rollout of most of the ArgoCD's components and even a manual cleaning into argocd-redis. This behavior only stops when argocd-image-updater is not running.

[!note] Update - Same as @LGLN-LS, downgrading argocd-image-updater to v0.14.0 fix the issue.

jannfis commented 1 day ago

Thanks everyone! Appreciate the insights here. I assume y'all who are hitting on this problem are using the default argocd update method, and not Git write-back, right?

mehdicopter commented 1 day ago

Thanks everyone! Appreciate the insights here. I assume y'all who are hitting on this problem are using the default argocd update method, and not Git write-back, right?

I am using both methods. But the one which fails is the default one indeed.

JeromeMSD commented 1 day ago

Thanks everyone! Appreciate the insights here. I assume y'all who are hitting on this problem are using the default argocd update method, and not Git write-back, right?

Indeed it's the specs from recently auto-updated apps without Git write-back-method that overwrite the specifications of other apps. In my case, "Recently auto-updated apps" refers to those that have received a new image since the upgrade to v0.15.0.

christian-schlichtherle commented 14 hours ago

I found this issue because version 0.15.0 overwrote my application, too. In our case, one of our own "Application" instances trumped its resources into another "Application" instance.

AmitBenAmi commented 8 hours ago

I had observed the same behaviors with app-of-apps pattern from 2 different environments with 2 different ArgoCD versions, but unfortunately, argocd-image-updater was configured with the latest tag, so it was for sure running later than v0.15.0.

I managed to see that within the app of apps, the last application in the list (lexicographically) was overwritten with its repoURL and path, causing this application to point to a wrong GitHub repo and path, eventually taking ownership of a different application totally outside of the app of apps one.

So eventually I had:

  1. Application called infrastructure with all of the really important resources
  2. Application called dummy app that took over the infrastructure resources

It also removed those resources at some point, effectively causing my cluster to be in a bad state.

It happened both with a regular app of apps, but also with an applicationset setup.

I opened up an issue with ArgoCD https://github.com/argoproj/argo-cd/issues/20440 that explains some of my behavior, but that was before I found this issue