fluxcd / flux2

Open and extensible continuous delivery solution for Kubernetes. Powered by GitOps Toolkit.
https://fluxcd.io
Apache License 2.0
6.17k stars 574 forks source link

Error while running post render on files when using HelmRelease with local alias dependencies #4368

Closed kgreczka-iteo closed 2 weeks ago

kgreczka-iteo commented 9 months ago

Describe the bug

When using a HelmRelease with Chart downloaded from Git Source, that have local alias dependencies we are experiencing strange behaviour. The outcome error is helm-controller reconciliation failed: Helm install failed: error while running post render on files: may not add resource with an already registered id:.

We could not find the root cause of the issue. While helm install and helm template then kubectl apply works perfectly, Flux cant make it work. There is some config that helps mitigating it, but that is a very wide workaround:

  1. Adding {{- if .Values.enabled }} statement to each resource in template.

    Note: It cannot be a simple if true - the value must be evaluated using external values

  2. Adding the "enabled" value to the subchart values.yaml, set to false.
  3. Aliasing the subchart and adding the "enabled" value to the parent Helm chart, once again must be false.
  4. Overriding the "enabled" value in a flux HelmRelease values.

The "enabled" value can only be overridden (i.e., set to true) in the HelmRelease step. If this value is enabled at any step closer to the subchart, it will trigger the error.

Steps to reproduce

  1. Create a HelmChart that have local dependencies referenced using aliases:
    dependencies:
    - alias: mychart1
    name: mysubchart
    version: ">0.0.0"
    - alias: mychart2
    name: mysubchart
    version: ">0.0.0"
  2. Deploy the chart to cluster using HelmRelease and GitRepository as source.

Expected behavior

The chart should be processed just like helm template or helm install does. There are no duplicates or resources with exact same id.

Screenshots and recordings

No response

OS / Distro

EKS

Flux version

v2.1.1

Flux check

flux check ► checking prerequisites ✗ flux 2.1.1 <2.1.2 (new version is available, please upgrade) ✔ Kubernetes 1.27.6-eks-f8587cb >=1.25.0-0 ► checking controllers ✔ helm-controller: deployment ready ► ghcr.io/fluxcd/helm-controller:v0.35.0 ✔ image-automation-controller: deployment ready ► ghcr.io/fluxcd/image-automation-controller:v0.35.0 ✔ image-reflector-controller: deployment ready ► ghcr.io/fluxcd/image-reflector-controller:v0.29.1 ✔ kustomize-controller: deployment ready ► ghcr.io/fluxcd/kustomize-controller:v1.0.1 ✔ notification-controller: deployment ready ► ghcr.io/fluxcd/notification-controller:v1.0.0 ✔ source-controller: deployment ready ► ghcr.io/fluxcd/source-controller:v1.0.1 ► checking crds ✔ alerts.notification.toolkit.fluxcd.io/v1beta2 ✔ buckets.source.toolkit.fluxcd.io/v1beta2 ✔ gitrepositories.source.toolkit.fluxcd.io/v1 ✔ helmcharts.source.toolkit.fluxcd.io/v1beta2 ✔ helmreleases.helm.toolkit.fluxcd.io/v2beta1 ✔ helmrepositories.source.toolkit.fluxcd.io/v1beta2 ✔ imagepolicies.image.toolkit.fluxcd.io/v1beta2 ✔ imagerepositories.image.toolkit.fluxcd.io/v1beta2 ✔ imageupdateautomations.image.toolkit.fluxcd.io/v1beta1 ✔ kustomizations.kustomize.toolkit.fluxcd.io/v1 ✔ ocirepositories.source.toolkit.fluxcd.io/v1beta2 ✔ providers.notification.toolkit.fluxcd.io/v1beta2 ✔ receivers.notification.toolkit.fluxcd.io/v1 ✔ all checks passed

Git provider

GitHub

Container Registry provider

No response

Additional context

No response

Code of Conduct

hiddeco commented 9 months ago

I suspect that something like helm template ... | yq '(.kind + "/" + .metadata.name)' | uniq -c will tell you there is at least one resource with a duplicate declaration.

kgreczka-iteo commented 9 months ago

I suspect that something like helm template ... | yq '(.kind + "/" + .metadata.name)' | uniq -c will tell you there is at least one resource with a duplicate declaration.

Nope, please dont make me repeat myself. I said There are no duplicates or resources with exact same id.. helm template gives output with no duplicates. Moreover helm install works perfectly.

Whats more following steps I described in Describe the bug section, without modyfing anything else makes it work.

To make sure there are not duplicates I went with the command you proposed. Here is the output:

   1 PodDisruptionBudget/myservice-sidekiq-default
   1 ---
   1 PodDisruptionBudget/myservice-sidekiq-urgent
   1 ---
   1 PodDisruptionBudget/myservice-web
   1 ---
   1 Service/myservice-web
   1 ---
   1 Deployment/myservice-sidekiq-default
   1 ---
   1 Deployment/myservice-sidekiq-urgent
   1 ---
   1 Deployment/myservice-web
   1 ---
   1 CronJob/myservice-cronjob
   1 ---
   1 Ingress/myservice-web
   1 ---
   1 ExternalSecret/myservice-postgres-access
   1 ---
   1 ExternalSecret/myservice-rails-key
   1 ---
   1 ExternalSecret/myservice-redis-access
   1 ---
   1 ExternalSecret/metrics-password
   1 ---
   1 PrometheusRule/myservice-web
   1 ---
   1 ScaledObject/myservice-sidekiq-default
   1 ---
   1 ScaledObject/myservice-sidekiq-urgent
   1 ---
   1 ServiceMonitor/myservice-web
hiddeco commented 9 months ago

The reason I am asking this, is because historically any bug reported like this ended up being some mistake in the chart, and not a problem with the helm-controller.

The fact that helm template and helm install work a expected does not really change anything. Given the helm-controller uses Kustomize to inject labels to track the resources which are part of the HelmRelease, which makes it more strict in terms of parsing the YAML produced by the chart.

One other thing that comes to mind is that you may have some empty document defined in the final helm template output, e.g.

---
apiVersion: v1
...
---
# empty
---
apiVersion: v1
...

If that is not the case either, I can not provide more help without being able to reproduce the issue myself (using e.g. your chart, or the output from helm template).

kgreczka-iteo commented 9 months ago

Ok, so I created a repo for this purpose with sample source files.

Combining it together gives following error:

helm-controller  reconciliation failed: Helm install failed: error while running post render on files: may not add resource with an already registered id: Pod.v1.[noGrp]/pod1.[noNs]

while its clear that there are only two resources and none of them is a duplicate:

helm template .
---
# Source: sample-flux-fail/charts/just-one-pod1/templates/pod.yaml
apiVersion: v1
kind: Pod
metadata:
  name: pod1
spec:
  containers:
  - name: nginx
    image: nginx:1.14.2
    ports:
    - containerPort: 80
---
# Source: sample-flux-fail/charts/just-one-pod2/templates/pod.yaml
apiVersion: v1
kind: Pod
metadata:
  name: pod2
spec:
  containers:
  - name: nginx
    image: nginx:1.14.2
    ports:
    - containerPort: 80
kgreczka-iteo commented 8 months ago

Are there any estimations for this bug?

jpetazzo commented 1 month ago

Hi!

I have exactly the same issue, with the following flux versions:

$ flux version
flux: v2.3.0
distribution: flux-v2.3.0
helm-controller: v1.0.1
kustomize-controller: v1.3.0
notification-controller: v1.3.0
source-controller: v1.3.0

I have a chart using a subchart, and the subchart is instanciated multiple times with aliases - exactly like @kgreczka-iteo mentioned.

My gut feeling is that Flux generates unique IDs for every resource, but when a Helm chart has subcharts with aliases, it uses the subchart name instead of its alias when generating the ID, thus leading into a conflict.

I don't have a fix (yet) but I wanted to confirm that this wasn't an isolated issue :)

guipace commented 3 weeks ago

+1 on this issue. I have the same versions as @jpetazzo

matheuscscp commented 3 weeks ago

@kgreczka-iteo thanks for the repo, I was able to reproduce the issue locally. The bug is (somewhere) in source-controller, the chart artifact produced after consuming the GitRepository source looks very different from the original, check the screenshots below.

Original chart in the git repo:

Screenshot from 2024-07-05 17-19-08

Output from helm template ...:

Click to expand YAML ```yaml --- # Source: sample-flux-fail/charts/just-one-pod1/templates/pod.yaml apiVersion: v1 kind: Pod metadata: name: pod1 spec: containers: - name: nginx image: nginx:1.14.2 ports: - containerPort: 80 --- # Source: sample-flux-fail/charts/just-one-pod2/templates/pod.yaml apiVersion: v1 kind: Pod metadata: name: pod2 spec: containers: - name: nginx image: nginx:1.14.2 ports: - containerPort: 80 ```

Chart produced by source-controller after consuming the git repo:

Screenshot from 2024-07-05 17-04-17

Output from helm template ...:

Click to expand YAML ```yaml --- # Source: sample-flux-fail/charts/just-one-pod1/charts/just-one-pod1/templates/pod.yaml apiVersion: v1 kind: Pod metadata: name: pod1 spec: containers: - name: nginx image: nginx:1.14.2 ports: - containerPort: 80 --- # Source: sample-flux-fail/charts/just-one-pod1/charts/just-one-pod2/templates/pod.yaml apiVersion: v1 kind: Pod metadata: name: pod2 spec: containers: - name: nginx image: nginx:1.14.2 ports: - containerPort: 80 --- # Source: sample-flux-fail/charts/just-one-pod1/templates/pod.yaml apiVersion: v1 kind: Pod metadata: name: pod1 spec: containers: - name: nginx image: nginx:1.14.2 ports: - containerPort: 80 --- # Source: sample-flux-fail/charts/just-one-pod2/charts/just-one-pod1/templates/pod.yaml apiVersion: v1 kind: Pod metadata: name: pod1 spec: containers: - name: nginx image: nginx:1.14.2 ports: - containerPort: 80 --- # Source: sample-flux-fail/charts/just-one-pod2/charts/just-one-pod2/templates/pod.yaml apiVersion: v1 kind: Pod metadata: name: pod2 spec: containers: - name: nginx image: nginx:1.14.2 ports: - containerPort: 80 --- # Source: sample-flux-fail/charts/just-one-pod2/templates/pod.yaml apiVersion: v1 kind: Pod metadata: name: pod2 spec: containers: - name: nginx image: nginx:1.14.2 ports: - containerPort: 80 ```

The pods are duplicated indeed

matheuscscp commented 3 weeks ago

By the way, there is a small bug in the example @kgreczka-iteo, the name of the subchart is just-one-pod in the Chart.yaml but the path is just-the-pod. Helm requires those two to be equal: https://github.com/helm/helm/blob/588041f6a55a8f23113f3c080d1733e65176fa0a/pkg/downloader/manager.go#L275-L278

In any case, the source-controller bug still happens even after fixing this.

matheuscscp commented 3 weeks ago

I found the issue, the chart name is not taken into account for resolving a local dependency:

https://github.com/fluxcd/source-controller/blob/8d8e7cc982cd51ca05dbb900bf7652dc04b1248f/internal/helm/chart/dependency_manager.go#L299-L306

johannwagner commented 2 weeks ago

We ran into exact the same issue yesterday, will report back, if this fixes it!