inline image policy marker comments merge problem #312

Open bob-rohan opened 2 years ago

bob-rohan commented 2 years ago

Supposing podinfo supported release branches (which I recommend against, but lots of places do use), then you might have a test environment setup with image automation defining inline image policy marker comments such as follows

newTag: release-dcd9b99-1643898245 # {"$imagepolicy": "flux-system:podinfo-release:tag"}

To deploy to production, these image IDs would be merged from the release branch, to the main.

In addition to supporting release branch based deployments, podinfo maintainers would also like Flux to support pain free hotfixes, with a view to using this for continuous delivery on other services. Then you might have an production environment setup with image automation defining inline image policy marker comments such as follows.

newTag: main-dcd9b99-1643898245 # {"$imagepolicy": "flux-system:podinfo-main:tag"}

However by merging the release branch to main, the inline image policy marker comment is erroneously updated to from podinfo-main to podinfo-release. The goal is to have

newTag: release-dcd9b99-1643898245 # {"$imagepolicy": "flux-system:podinfo-main:tag"}

See the tag is now set to release-dcd9b99-1643898245, but the image policy will continue to identify hotfixes from podinfo-main.

It would be better not to use inline comments

stefanprodan commented 2 years ago

You can place these markers on the Flux Kustomization object that applies the app on an environment. The one in clusters/staging will use different policy than the one in clusters/production e.g.:


apiVersion: kustomize.toolkit.fluxcd.io/v1beta2
kind: Kustomization
  name: podinfo
  namespace: default
    - name: ghcr.io/stefanprodan/podinfo
      newName: ghcr.io/stefanprodan/podinfo # {"$imagepolicy": "flux-system:podinfo-release:name"}
      newTag: 5.0.0 # {"$imagepolicy": "flux-system:podinfo-release:tag"}


apiVersion: kustomize.toolkit.fluxcd.io/v1beta2
kind: Kustomization
  name: podinfo
  namespace: default
    - name: ghcr.io/stefanprodan/podinfo
      newName: ghcr.io/stefanprodan/podinfo # {"$imagepolicy": "flux-system:podinfo-main:name"}
      newTag: 5.0.0 # {"$imagepolicy": "flux-system:podinfo-main:tag"}
bob-rohan commented 2 years ago

I believe in your example, you may have reduced the effectiveness of Flux by introducing additional overhead, or risk, or both - depending on what your strategy is for promotion.

In the example quoted there are a few material differences

Based on this feedback, if you could elaborate on the mechanism/process you see for promoting between environments, I can provide further clarity.

stefanprodan commented 2 years ago

Ok so there are no overlays for each env, the staging cluster is identical to the production one, nothing differs between the 2 branches but the image tags?

bob-rohan commented 2 years ago

I think the short answer is nothing relevant, but I'll include some detail incase you're driving at a point I don't yet understand.

In the fleet repo, both staging and production clusters declare a several GitRepository kind's.

Flux Kustomization yaml enacts reconcile loop on image-ids. Both manifests and application-configuration are embedded via include, with kustomization.yaml referring to relative paths.

The GitRepository refspec if different per environment

Few additional things are fed through from the fleet repo to the manifests via Flux Kustomization overlay, things like the environment name for nonprod would be staging, as apposed to the uat environment also within the nonprod cluster. But your terminology of staging cluster is entirely sufficient to frame the issue.

In terms of image-ids repo, multiple branches

Both will have the same structure, i.e a kustomization.yaml file declaring a collection of the tenant's images. Both the image tags and the inline image policy marker comments will differ between branches.

stefanprodan commented 2 years ago

Ok so you could use a single policy and swap the filter based on the target cluster, that marker will be the same (no merge issues) but they will act differently depending on the cluster.

bob-rohan commented 2 years ago

Yes, I think I can see how that would work for the example above.

We have multiple policies in some clusters, but by refactoring the image automation CRDs from flux-system namespace to the tenant namespace, I think that could work also.

I will have a go. Thanks @stefanprodan

bob-rohan commented 2 years ago

Been a bit of a day going back and forth on this one. Main fumble point seems to be around what namespace various components logically reside within, Secret (for git), GitRepository, ecr-credentails-sync, ImagePolicy, ImageRegistry, ImageAutomation etc

I think I'm just about where I want to be on this one but for one snag. I feel lucky this merges some cross namespace functionality I'd like to use, for better or worse


While Flux cli 0.26.2 (latest) includes image-automation-controller > 0.20.0 (see link above), it would seem the ImageUpdateAutomation schema defined by flux boostrap does not set namespace as a valid arg.

If the above is correct, how long to get that functionality available please?

stefanprodan commented 2 years ago

While Flux cli 0.26.2 (latest) includes image-automation-controller > 0.20.0 (see link above), it would seem the ImageUpdateAutomation schema defined by flux boostrap does not set namespace as a valid arg.

Have you upgraded your cluster? Can you post here kubectl get crd imageupdateautomations.image.toolkit.fluxcd.io -oyaml?

bob-rohan commented 2 years ago

Yes, well, sort of. We dump the flux bootstrap results out to disk and store in git for bootstrapping via terraform initiated custom ECS task. Gives us a sensible way to PR and control rollout.

stefanprodan commented 2 years ago

Your cluster is way behind app.kubernetes.io/version: v0.24.1, you need to update to v0.26.2.

bob-rohan commented 2 years ago

Yep, fair point. The upgrade did get me past that issue thanks.

After many more refactoring, can confirm while your suggestion is very useful for those looking to run a single image policy per service per cluster, it does not work where there are many instances within the same cluster.

I had hoped that by refactoring the ImagePolicy, ImageRepository and ImageUpdateAutomation from flux-system namespace, to tenant environment namespace (<tenant>-staging, <tenant>-uat), I would be able to build on your idea of a common policy name.

However a better mind than mine would have noted without the need for two days refactoring, that the namespace is part of the inline image policy marker, and as such falls to the same merge issue described above.