fluxcd / helm-controller

The GitOps Toolkit Helm reconciler, for declarative Helming
https://fluxcd.io
Apache License 2.0
406 stars 160 forks source link

Helm controller has incorrect behavior when dealing with duplicate environment variables #539

Open daniel-anova opened 1 year ago

daniel-anova commented 1 year ago

Helm controller allows creating a Helm Release with a duplicate environment variable, it deploys with just a warning about the duplication.

However, when removing one of the duplicate values Helm controller will remove all copies of that environment variable.

So if the deployment has:

    env:
    - name: ihave
      value: "apple"
    - name: ihave
      value: "pen"

and values are set to:

  values:
    env:
    - name: ihave
      value: "apple"

The resulting deployment env becomes:

    env: []

The only way to restore the deployment is to delete the manifest and force reconciliation or add an additional environment variable.

When trying to add duplicate values to an existing HelmRelease however the new duplicates are ignored.

    env:
    - name: ihave
      value: "apple"

and values are set to:

  values:
    env:
    - name: ihave
      value: "apple"
    - name: ihave
      value: "pen"

The resulting deployment env becomes:

    env:
    - name: ihave
      value: "apple"

The behavior should be consistent, helm controller should never allow the duplicate values or deal with them correctly on removal.

Test case Helm Release

apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
  name: bad-deployment
spec:
  releaseName: bad-deployment
  chart:
    spec:
      chart: bad-deployment
      sourceRef:
        kind: GitRepository
        name: bad-deployment
  interval: 5m
  install:
    remediation:
      retries: 3
  values:
    name: bad-deployment-test
    env:
    - name: ihave
      value: "apple"
    - name: ihave
      value: "pen"

Relevant part of Helm Chart template

apiVersion: apps/v1
kind: Deployment
metadata:
  name: {{ .Values.name }}
  labels:
    app: {{ .Values.name }}
    release: {{ .Release.Name }}
spec:
  strategy:
    type: RollingUpdate
  selector:
    matchLabels:
      app: {{ .Values.name }}
  template:
    metadata:
      labels:
        app: {{ .Values.name }}
        release: {{ .Release.Name }}
    spec:
      containers:
      - name: {{ .Values.name }}
        image: "busybox:latest"
        command: ["/bin/sleep", "10d"]
        env:
          {{- if .Values.env}}
          {{- toYaml .Values.env | nindent 10 }}
          {{- end}}
kingdonb commented 1 year ago

Could you let us know what version of Flux is installed?

I'm looking at this issue and it reminds me of an issue that affected Flux's "new" Server Side Apply when it was still very new, but it's been resolved for some time. It might not be the same issue, but in the interest of ruling it out quickly, what is the output of flux check (it includes the version information)

kingdonb commented 1 year ago

Also, maybe this: was there any error when the HelmRelease change was not successfully applied? (It seems the changes were ignored, maybe it was an error that actually prevented the reconciliation from completing.)

Eg. at this point,

and values are set to:

  values:
    env:
    - name: ihave
      value: "apple"
The resulting deployment env becomes:

    env: []

it seems like the change perhaps hasn't actually been applied, can you check if there is any error reconciling the HelmRelease or parent Kustomization that might explain why this happened?

This could be a bug in Helm's three-way merge, or something else. Thank you for reporting this! I think you've given us enough information here to reproduce it, though it may take some time as we are working on issues for GA as well as KubeCon, (but if this is a bug then obviously we'd like to squash it before GA.)

daniel-anova commented 1 year ago

Could you let us know what version of Flux is installed?

I'm looking at this issue and it reminds me of an issue that affected Flux's "new" Server Side Apply when it was still very new, but it's been resolved for some time. It might not be the same issue, but in the interest of ruling it out quickly, what is the output of flux check (it includes the version information)

Currently on v0.24 of the Helm Controller.

it seems like the change perhaps hasn't actually been applied, can you check if there is any error reconciling the HelmRelease or parent Kustomization that might explain why this happened?

Test manifests where applied directly with kubectl so Kustomization Controller was not involved.

As for Helm Controller log it shows nothing relevant, simply that reconciliation finished:

time message
2022-10-06 09:17:23 msg="reconcilation finished in 57.883487ms, next run in 5m0s"
2022-10-06 09:15:30 msg="reconcilation finished in 45.435263ms, next run in 5m0s"
2022-10-06 09:12:11 msg="reconcilation finished in 9.175023225s, next run in 5m0s"
2022-10-06 09:10:43 msg="reconcilation finished in 12.716648708s, next run in 5m0s"
2022-10-06 09:10:30 msg="reconcilation finished in 83.638857ms, next run in 5m0s"
kingdonb commented 1 year ago

I wonder if this is a bug in Flux at all, or if it's even a bug in Helm

We need to see if this behavior reproduces in the Helm CLI if you try a similar thing. I'm wondering if it's somehow part of the three-way merge behavior. You mentioned that you did kubectl apply and aren't using Kustomize controller – just to clarify that could be part of the issue as well. If you did kubectl apply then it's using a three-way merge, and you need to be quite sure the HelmRelease that's represented on the cluster says what you think. It might have merged the values in the array.

If you were using Kustomize controller here, it does not use kubectl apply with three-way merge in quite the same way, Flux will actually wipe out any drift (unless you needed three-way merge, and there's a way to opt into it) but in the kubectl cli for Kubernetes, kubectl apply -f by default is a three-way merge, and you can get strange behavior from it. Broadly speaking, the server-side apply feature in Kustomize controller fixes this.

If you can confirm whether the YAML on the cluster has really been affected, one way to see is to use kubectl edit in between and verify the YAML looks how you expect, another way to mitigate this problem would be to see if it behaves the same when you do kubectl replace instead of kubectl apply.

Can you check on these avenues so we can verify this is a bug before we spent more time to investigate further?

daniel-anova commented 1 year ago

The HelmRelease was correct with both kubectl and Kustomize controller, I just retested the test case to be 100% sure and when removing the duplicate value the HelmRelease shows the remaining env variable as expected, the same behavior is seen when using the controller.

The helm rendered Deployment gets all env variable entries removed, further reconciliation does not add the existing var in the HelmRelease back to the deployment.

I hope that clarifies things.

kingdonb commented 1 year ago

Thanks @daniel-anova – I hope to see that someone can follow this up before too long after KubeCon.

I won't be able to look at it again before then myself

smoke commented 1 year ago

I have stumbled on this in a bit different way, but I can tell for sure that Helm Controller does not comply to how Helm works.

Here the case - basically I am trying to leverage the K8S rules of merging and defining same variable in different places https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/pod-v1/#environment-variables

Simplified the precedence of environment variables is the following

If it happen that in per section e.g. env there are duplicate names for environments - the last one wins.

So having a deployment.yaml template that has:

...
          {{- if or .Values.env .Values.envOverride }}
          env:
          {{- end }}
          {{- if or .Values.env }}
          {{- toYaml .Values.env | nindent 10 }}
          {{- end }}
          {{- if or .Values.envOverride }}
          {{- toYaml .Values.envOverride | nindent 10 }}
          {{- end }}
...

given values.yaml

env:
  - name: DEBUG
    value: 'A'
envOverride:
  - name: DEBUG
    value: 'B'

Using Helm Controller produces a deployment having only the first item of the list:

...
          env:
            - name: DEBUG
              value: 'A'
...

Where Helm itself produces a warning W0603 16:56:14.612333 378861 warnings.go:70] spec.template.spec.containers[0].env[6].name: duplicate name "DEBUG" and a deployment having both items of the list as intended to work with the precedence of K8S described above:

...
          env:
            - name: DEBUG
              value: 'A'
            - name: DEBUG
              value: 'B'
...

So at the using Helm Controller the resulting value in the container will be DEBUG=A and that is not correct, where using Helm the resulting value in the container will be DEBUG=B that is correct.

f-ld commented 9 months ago

I wonder if this is a bug in Flux at all, or if it's even a bug in Helm

Please check this kubernetes issue (and this helm one closed because stale)