bitnami / charts

Bitnami Helm Charts
https://bitnami.com
Other
8.9k stars 9.17k forks source link

[bitnami/rabbitmq] Helm diff always different when a new chart version release #10985

Closed Dunge closed 2 years ago

Dunge commented 2 years ago

Name and Version

bitnami/rabbitmq

What steps will reproduce the bug?

I'm still new to this so this is probably a misunderstanding in my comprehension and not a bug, but it's strange.

I'm using the helmfile project with helmfile apply to deploy a cluster, which uses the helm diff plugin internally to check for modifications before applying them.

This is a snippet of my helmfile:

repositories:
- name: bitnami
  url: https://charts.bitnami.com/bitnami

releases:
- name: jl-rabbitmq
  namespace: default
  chart: bitnami/rabbitmq
  labels:
    name: jl-rabbitmq
  version: ~10.1.8
  values:
    - ./values/rabbitmq.yaml

As you can see, I'm using the tilde (~) operator to accept any new "patch version" chart, so if I understand correctly it should update charts to any new release matching 10.1.X

It worked fine at first, but when a new chart release (three times since I selected this version) afterward every run to helmfile apply tells me there's a difference en need to update, even if it already done just seconds before:

Upgrading release=jl-rabbitmq, chart=bitnami/rabbitmq
Release "jl-rabbitmq" has been upgraded. Happy Helming!

\Kubernetes> helm history jl-rabbitmq

REVISION        UPDATED                         STATUS          CHART                   APP VERSION     DESCRIPTION
1               Wed Jun 29 15:25:59 2022        superseded      rabbitmq-10.1.10        3.10.5          Install complete
2               Thu Jun 30 12:37:01 2022        superseded      rabbitmq-10.1.11        3.10.5          Upgrade complete
3               Thu Jun 30 13:11:21 2022        superseded      rabbitmq-10.1.11        3.10.5          Upgrade complete
4               Thu Jun 30 13:41:32 2022        deployed        rabbitmq-10.1.11        3.10.5          Upgrade complete

The difference the "diff" gives is this:

-         checksum/secret: ecc2e11d3c50f7d5ffbe74056b4477a33f5cd303f36672aa808e8cce04249855
+         checksum/secret: 9b9a7129c10e4a0a5bf5db44184ac39c3c25b9cab2f2431e363bbc4e5eea73b0

which is under rabbitmq/templates/statefulset.yaml spec.template.metadata.annotations

Are you using any custom parameters or values?

Not sure if it matters, but here's my values file:

replicaCount: 3

auth:
  username: jl
  password: jl
  erlangCookie: BAxTTl4hvij3VkivFaBtYtzZLNlO7ttN

clustering:
  partitionHandling: pause_minority

extraSecrets:
  load-definition:
    load_definition.json: |
      {
        "permissions": [
          {
            "user": "jl",
            "vhost": "/",
            "configure": ".*",
            "write": ".*",
            "read": ".*"
          }
        ],
        "vhosts": [
          {
            "name": "/"
          }
        ],
        "policies": [
          {
            "name": "ha-all",
            "pattern": ".*",
            "vhost": "/",
            "definition": {
              "ha-mode": "all",
              "ha-sync-mode":   "automatic",
              "max-length-bytes": 50000000,
            }
          }
        ]
      }

loadDefinition:
  enabled: true
  existingSecret: load-definition

extraConfiguration: |-
  default_vhost = /
  default_permissions.configure = .*
  default_permissions.read = .*
  default_permissions.write = .*
  load_definitions = /app/load_definition.json

What is the expected behavior?

No change if I already updated the chart to the latest revision once

What do you see instead?

Always a new diff, always a new upgrade

carrodher commented 2 years ago

It seems it is not an issue related to the Bitnami Rabbitmq container image or Helm chart but about how the application or environment is being used/configured.

For information regarding the application itself, customization of the content within the application, or questions about the use of technology or infrastructure; we highly recommend checking forums and user guides made available by the project behind the application or the technology.

That said, we will keep this ticket open until the stale bot closes it just in case someone from the community adds some valuable info.

dostalradim commented 2 years ago

Same problem for me, we have rabbitmq chart in dependencies for our application with following configuration.

clusterDomain: "k8s-dev.example.com"
fullnameOverride: "stc-ng-rabbitmq"
clustering:
  enabled: false
metrics:
  enabled: true
service:
  labels:
    metrics: "true"
auth:
  username: "admin"
  password: "Password"
  erlangCookie: "Password"

When we install our application with helm upgrade -i .... it makes the first installation and everything starts successfully. Then we change something in our application (which means a new docker image and helm version of our application) we have still dependency on the same rabbitmq chart version. Then we call helm upgrade -i... again and because it exists helm is going to do an application upgrade and during this first upgrade, the rabbitmq has been restarted. But during the next upgrades, the rabbitmq is not restarted. So I did a dump of rabbitmq statefulset before the first upgrade and after the first upgrade and I saw checksum change between it but the version and configuration are the same. The secret file should be the same too because it is the same rabbitmq chart version. I had no luck with simulation this problem with the solo rabbitmq chart.

# kubectl get -n stc-ng-radim statefulset stc-ng-rabbitmq -o yaml > stc-ng-rabbitmq.yaml
# kubectl get -n stc-ng-radim statefulset stc-ng-rabbitmq -o yaml > stc-ng-rabbitmq-po.yaml
# diff stc-ng-rabbitmq.yaml stc-ng-rabbitmq-po.yaml 
---TRUNC---
<         checksum/secret: c5d70a3b3ca1a3cef6021f960378855917f1ebcef3a7434f2e39ac4cde60bde0
---
>         checksum/secret: 501bea9995b4eea40a6dbd8e3d05b13bc3f4435b8dbde5783f6b46ea016a358d 
---TRUNC---
Dunge commented 2 years ago

Seems pretty weird to say it's due to the application and to request information on their forums. Rabbitmq developers provide configuration via static .config files, they have no idea what this helm chart decided to store inside secrets and why they would always generate differently. It also makes the whole thing unusable because it's unnaceptable to have rabbitmq services shutting down every time we have to change something else in our cluster.

carrodher commented 2 years ago

I was not only referring to the rabbitmq forums but the tool used to deploy this Helm chart (or the plugin used behind scenes). In some resources or deployed objects there are fields like timestamps which can be different between two deployments even if the Helm chart and the used values are the same

Dunge commented 2 years ago

I am still very new to all of this, but I don't believe it is the case here. Looking at the helm chart template, everything used in this checksum/secret part comes from the chart itself. There's no timestamp in this particular secret, and there's no extra plugin being used, just the most standard up-to-date kubectl, helm, and helmfile tools. From my understanding, secrets and config are supposed to be baked from the helm chart result after rendering the template, and always render exactly the same way when using the same chart version. And that should one of the core pillar of using helm charts, otherwise what's the point.

Here's the relevant part of the chart template:

        checksum/secret: {{ include (print $.Template.BasePath "/secrets.yaml") . | sha256sum }}
        {{- end }}
        {{- if .Values.podAnnotations }}
        {{- include "common.tplvalues.render" (dict "value" .Values.podAnnotations "context" $) | nindent 8 }}
        {{- end }}
        {{- if and .Values.metrics.enabled .Values.metrics.podAnnotations }}
        {{- include "common.tplvalues.render" (dict "value" .Values.metrics.podAnnotations "context" $) | nindent 8 }}
        {{- end }}

Since in my case I did not set any additional podAnnotations value or enabled metric, and the default values are empty and false, I think we can ignore the bottom portion. So the checksum getting computed seems to comes only from the secret.yaml file. Let's now look at that file. I can see what will be generated by using the --dry-run --debug argument on helm:

It is split in a few secret resources. The first one is a secret with the release name and containing two values: rabbitmq-password and rabbitmq-erlang-cookie. Those values comes from the provided values.yaml file and are the values I pass when upgrading. They are static and are the same than the ones I see when reading the secret on my active cluster, so I doubt they influence the sha256sum difference.

Then there's a line I don't quite understand:

> {{- range $key, $value := .Values.extraSecrets }}

But what I do see, is that it seems to be related to a secret getting generated called "load-definition" containing the extra parameters required to create a HA cluster, again passed in the values.yaml file (see first post above), and again being static and not different from one upgrade command run to another.

And then there's a third section with a name called name: {{ ternary (printf "%s-%s" $.Release.Name $key) $key $.Values.extraSecretsPrependReleaseName }} which I'm not sure at all what it is, but doesn't seems to get rendered in the resulting installation.

All that said is, I'm sorry @carrodher, but I'm pretty certain this has nothing to do with the infrastructure or tooling, and everything to do with the chart structure itself. Something doesn't render the file in a uniform way as it should be doing. But I am unfortunately a bit out of my depth here, so that's why I'm asking help from one of the chart creator if they have a better idea of what's going on, because I would really love to be able to use that chart.

carrodher commented 2 years ago

I installed the chart twice, using the same name and same values (the ones you provided) and the result is 100% the same in both deployments, there is not any change in the resulting templates:

Here you can find the differences (only the command itself and the deployment time).

Then, I did the same but installed a previous version of the Helm chart and "upgraded" it to the latest one, as expected, there are some differences now. One of the differences is in the helm.sh/chart label which is based on the chart version, since this label is included in the secrets, the checksum is going to change as well. Here you can find the differences.

From the tests above, unless you actually change something like the version, the secret is invariant between deployments using the given values. That's why I don't see any issue in the chart itself but maybe something weird in the tool used to deploy it, could you please try to see if there is something behind scenes doing something different from the above tests in the tool you're using to deploy the Helm chart? Sincerely I don't know how the Helm chart can be improved to solve your use case.

Dunge commented 2 years ago

Thank you very much for taking the time to test run this yourself.

I got to say, doing the same operations as you do, I also have the same result. The generated deployment always tells me it will install the same thing (checksum 72d5029d2102c12efc1599a1b2b34bb78e6f118a8c045ff62702e8d7df4204c1 with the latest chart version).

Problem is is what actually get installed does not match. When running the "helm diff" plugin afterwards, it always tells me the current checksum state is 45174dc63168081a3edd0c6c59899a1dc38e52d273bed3963eb8fe6727f92346 instead, and that it needs to upgrade to get it to match the one starting with 72. And no matter how many times I upgrade, it remains with the one starting with 45.

You can probably check this yourself if you still have your test cluster active: kubectl describe statefulset rabbit-rabbitmq

Somewhere in there you find checksum/secret: 45174dc63168081a3edd0c6c59899a1dc38e52d273bed3963eb8fe6727f92346 (which is different than the one in the file you posted starting with 72). I believe that diff plugin read the latest deployed version state in the sh.helm.release.vX secrets and not the current cluster state though, so that means that it was transformed during the installation process, and not by something else afterwards.

Oh, and again, if I just start from scratch and install 10.1.12 it doesn't do that. I need to install a previous one (10.1.11), upgrade to 10.1.12, and then it stay "different" for every subsequent upgrade call. As if the upgrade doesn't update the checksum or something. Could it have something to do with the "existingSecret" value?

rafariossaa commented 2 years ago

Hi, I would suggest the same as @carrodher , check if there is something behing the scenes. Regarding the existingSecret , have you checked if it is changing somehow ?

Dunge commented 2 years ago

I see the label with the version on the existingSecret getting updated when changing version, the content itself doesn't seem to ever change. And no, nothing change after subsequent calls to upgrades.

Do you have any hint as to what to check "behind the scene"? I believe I have the most basic and standard cluster infrastructure there is. I also believe anyone trying the same thing would get the same result, no matter their setup.

What could be causing helm to register a different installed resource content in its history than the one that get generated by that chart?

rafariossaa commented 2 years ago

I am not sure what it could be. Maybe you could ask in helm forums, I think they could provide a better hint on this.

Dunge commented 2 years ago

I got additional information. When using helmfile diff --show-secrets it shows another resource that is different, the secret containing the auth:

-   rabbitmq-password: amw=
+   rabbitmq-password: "amw="

-   rabbitmq-erlang-cookie: QkF4VFRsNGh2aWozVmtpdkZhQnRZdHpaTE5sTzd0dE4=
+   rabbitmq-erlang-cookie: "QkF4VFRsNGh2aWozVmtpdkZhQnRZdHpaTE5sTzd0dE4="

One version have quotes, the other doesn't.

I thought my error could have been that in my values.yaml file, the auth.password and auth.erlangCookie values were not written as string with quotation marks. As you can see in the default values for the chart, they should be, while the username shouldn't.

But no, even if I do, it still cause the same issue. One version of the diff doesn't have quotation marks, the other do. There's a rendering mismatch between some subsystems of the helm ecosystem here.

rafariossaa commented 2 years ago

If I run the following, I got the password and cookie quoted:

$ helm template myrabbit -f values.yaml -f p1.yaml . > rendered.yaml
$ grep -e rabbitmq-pass -e rabbitmq-erlang-cookie rendered.yaml 
  rabbitmq-password: "amw="
  rabbitmq-erlang-cookie: "QkF4VFRsNGh2aWozVmtpdkZhQnRZdHpaTE5sTzd0dE4="

The values.yaml file used was the default one, and p1.yaml was your changes.

Dunge commented 2 years ago

Thanks @rafariossaa , but I'm not sure what you are trying to say? How can I make sure the history saved in the helm deployment will be the same (always quoted?) than the one getting rendered for real?

My feeling is that there's an issue in the secret.yaml definition of the chart on that line not handling quotes correctly?:

  rabbitmq-password: {{ include "common.secrets.passwords.manage" (dict "secret" (include "common.names.fullname" .) "key" "rabbitmq-password" "length" 16 "providedValues" (list "auth.password") "context" $) }}
rafariossaa commented 2 years ago

Hi, I am not sure about, I would suggest you to ask in helm tool forums, I think they are going to provide a better answer to this as they now the inners of the helm tool.

github-actions[bot] commented 2 years ago

This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback.

github-actions[bot] commented 2 years ago

Due to the lack of activity in the last 5 days since it was marked as "stale", we proceed to close this Issue. Do not hesitate to reopen it later if necessary.

alex-bes-ccl commented 3 months ago

is there anyone who managed to work this around?

Dunge commented 3 months ago

@alex-bes-ccl I believe the PR above in 2022 fixed this for me. Strange you still have an issue

alex-bes-ccl commented 3 months ago

thank you very much for you reply @Dunge! yeah, the snippets I have been playing around are way to old! this does not seem to be a problem with the latest chart version! Thank you!