fluxcd / flux2

Open and extensible continuous delivery solution for Kubernetes. Powered by GitOps Toolkit.
https://fluxcd.io
Apache License 2.0
6.36k stars 592 forks source link

Image automation pushes alot of commits (multi clusters) #4941

Open ehrnst opened 3 weeks ago

ehrnst commented 3 weeks ago

Describe the bug

We have setup of clusters in multiple regions. But dev and prod clusters are present in the different regions. Application deployments should be the same in region 1 dev cluster and region 2 dev cluster. And for that reason ImageUpdateAutomation is set up.

When building new images, the commit is pushed back to the origin git(hub) repo. However, it looks like the clusters might be playing ping-pong, as every minute we get a new commit.

image

What could potentially cause this?

For this application there is one imageRepository andgitRepository in flux-system namespace. ImageUpdateAutomation sits in the application/environment namespace.

Steps to reproduce

Not sure.

Expected behavior

Only one commit with the latest image is pushed to git. Not flip/flop between latest and previous image.

Screenshots and recordings

No response

OS / Distro

Ubuntu

Flux version

2.1.2

Flux check

flux 2.1.2 <2.3.0 (new version is available, please upgrade) ✔ Kubernetes 1.29.4 >=1.25.0-0 ► checking controllers ✔ fluxconfig-agent: deployment ready ► mcr.microsoft.com/azurek8sflux/fluxconfig-agent:1.11.1 ► mcr.microsoft.com/azurek8sflux/fluent-bit-mdm:1.11.1 ✔ fluxconfig-controller: deployment ready ► mcr.microsoft.com/azurek8sflux/fluxconfig-controller:1.11.1 ► mcr.microsoft.com/azurek8sflux/fluent-bit-mdm:1.11.1 ✔ helm-controller: deployment ready ► mcr.microsoft.com/oss/fluxcd/helm-controller:v1.0.1 ✔ image-automation-controller: deployment ready ► mcr.microsoft.com/oss/fluxcd/image-automation-controller:v0.38.0 ✔ image-reflector-controller: deployment ready ► mcr.microsoft.com/oss/fluxcd/image-reflector-controller:v0.32.0 ✔ kustomize-controller: deployment ready ► mcr.microsoft.com/oss/fluxcd/kustomize-controller:v1.3.0 ✔ notification-controller: deployment ready ► mcr.microsoft.com/oss/fluxcd/notification-controller:v1.3.0 ✔ source-controller: deployment ready ► mcr.microsoft.com/oss/fluxcd/source-controller:v1.3.0 ► checking crds ✔ alerts.notification.toolkit.fluxcd.io/v1beta3 ✔ buckets.source.toolkit.fluxcd.io/v1beta2 ✔ fluxconfigs.clusterconfig.azure.com/v1alpha1 ✔ gitrepositories.source.toolkit.fluxcd.io/v1 ✔ helmcharts.source.toolkit.fluxcd.io/v1 ✔ helmreleases.helm.toolkit.fluxcd.io/v2 ✔ helmrepositories.source.toolkit.fluxcd.io/v1 ✔ imagepolicies.image.toolkit.fluxcd.io/v1beta2 ✔ imagerepositories.image.toolkit.fluxcd.io/v1beta2 ✔ imageupdateautomations.image.toolkit.fluxcd.io/v1beta2 ✔ kustomizations.kustomize.toolkit.fluxcd.io/v1 ✔ ocirepositories.source.toolkit.fluxcd.io/v1beta2 ✔ providers.notification.toolkit.fluxcd.io/v1beta3 ✔ receivers.notification.toolkit.fluxcd.io/v1 ✔ all checks passed

Git provider

GitHub

Container Registry provider

Azure Container Registry

Additional context

I know AKS (Azure) is not on latest flux for all regions

Code of Conduct

stefanprodan commented 3 weeks ago

You should be running the image automation for a path on a single cluster.

ehrnst commented 3 weeks ago

You should be running the image automation for a path on a single cluster.

so if i understand correctly, we cannot have one overlay for dev which is deployed to two regions? meaning we have to create overlays per cluster and environment? IE we have one cluster where application is deployed to both dev and qa in different namespaces. structure is now

├── deployment
│   ├── base
│   │   ├── **/*.yaml
│   ├── overlays
│   │   ├── dev
│   │   │   ├── **/*.yaml
│   │   ├── staging
│   │   │   ├── **/*.yaml
│   │   ├── prod

image automation inside dev in this case looks like this

apiVersion: image.toolkit.fluxcd.io/v1beta1
kind: ImageUpdateAutomation
metadata:
  name: appname
spec:
  interval: 1m
  sourceRef:
    kind: GitRepository
    name: appname
    namespace: flux-system
  git:
    checkout:
      ref:
        branch: dev
    commit:
      messageTemplate: |
        Automated image update

        [skip ci]

        Automation name: {{ .AutomationObject }}

        Files:
        {{ range $filename, $_ := .Updated.Files -}}
        - {{ $filename }}
        {{ end -}}

        Objects:
        {{ range $resource, $_ := .Updated.Objects -}}
        - {{ $resource.Kind }} {{ $resource.Name }}
        {{ end -}}

        Images:
        {{ range .Updated.Images -}}
        - {{.}}
        {{ end -}} 
      author:
        email: fluxcdbot@users.noreply.github.com
        name: fluxcdbot
    push:
      branch: dev
  update:
    path: ./deployment/overlays
    strategy: Setters
makkes commented 3 weeks ago

You can't have two ImageUpdateAutomation resources running against the same path, it just doesn't make sense because they race against each other.

ehrnst commented 3 weeks ago

You can't have two ImageUpdateAutomation resources running against the same path, it just doesn't make sense because they race against each other.

I see. earlier i found this issue and i figured the statement Flux will push a single commit no matter on how many clusters it runs, the fastest cluster will push the changes, then all others will see there is nothing to commit and do nothing. was the one saving me here. but as i experienced, and you say. there is no single commit. each cluster will commit what they think its the latest. And if some clusters have not yet synced their git repository, their commits will be stale. Do i understand it correct?

stefanprodan commented 3 weeks ago

so if i understand correctly, we cannot have one overlay for dev which is deployed to two regions?

Yes you can, just scale to zero the image automation controllers on all regions except for one.

ehrnst commented 3 weeks ago

so if i understand correctly, we cannot have one overlay for dev which is deployed to two regions?

Yes you can, just scale to zero the image automation controllers on all regions except for one.

i think that will bite us some time down the road. there will be a massive amount of overlays per app here. We have to take in to account blue/green clusters as well.

so this will be correct, and all imageAutomations are exactly identical

├── deployment
│   ├── base
│   │   ├── **/*.yaml
│   ├── overlays
│   │   ├── dev
│   │   │   ├── region 1
│   │   │   │   ├── blue
│   │   │   │   │   ├── imageAutomation.yaml
│   │   │   │   ├── green
│   │   │   │   │   ├── imageAutomation.yaml
│   │   │   ├── region 2
│   │   ├── staging
│   │   ├── prod