fluxcd / flux2

Open and extensible continuous delivery solution for Kubernetes. Powered by GitOps Toolkit.
https://fluxcd.io
Apache License 2.0
6.36k stars 592 forks source link

Helm upgrade failed for release, <HELM CHART> has no deployed releases #4614

Open Nathan-Nesbitt opened 7 months ago

Nathan-Nesbitt commented 7 months ago

Describe the bug

Here is the config file that I have for creating the rook-ceph-cluster as defined here. I can deploy using almost the same layout for the rook-ceph helm chart but the rook-ceph-cluster chart fails.

apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
  name: rook-ceph-cluster
  namespace: rook-ceph
spec:
  chart:
    spec:
      chart: rook-ceph-cluster
      version: 1.13.x
      sourceRef:
        kind: HelmRepository
        name: rook-ceph
        namespace: flux-system
  interval: 30m
  timeout: 10m
  install:
    remediation:
      retries: 3
  upgrade:
    remediation:
      retries: -1
    crds: CreateReplace
  releaseName: rook-ceph-cluster
  values:
    # ALL THE VALUES FROM values.yaml

This is the helm repository file:

apiVersion: source.toolkit.fluxcd.io/v1beta1
kind: HelmRepository
metadata:
  name: rook-ceph
  namespace: flux-system
spec:
  interval: 15m
  url: https://charts.rook.io/release

When I go to deploy, I get the following error, which I cannot for the life of me figure out why.

Helm upgrade failed for release rook-ceph/rook-ceph-cluster with chart rook-ceph-cluster@v1.13.4: "rook-ceph-cluster" has no deployed releases

I can create it manually from the cli using helm by running the following:

helm repo add rook https://charts.rook.io/release
helm install my-rook-ceph-cluster rook/rook-ceph-cluster --version 1.13.4

So it's not an issue with the helm repository, or with rook.

Any ideas?

Steps to reproduce

  1. Create a K8s cluster
  2. Create a setup based on the example git repo
  3. Try to install rook-ceph using the above config (1/2 files, there is a lot of setup for this, but the other files do not cause an error)

Expected behavior

It should just run as expected 🀷

Screenshots and recordings

No response

OS / Distro

Ubuntu -- Jammy

Flux version

flux version 2.2.3

Flux check

β–Ί checking prerequisites βœ” Kubernetes 1.27.10+rke2r1 >=1.26.0-0 β–Ί checking version in cluster βœ” distribution: flux-v2.2.3 βœ” bootstrapped: true β–Ί checking controllers βœ” helm-controller: deployment ready β–Ί ghcr.io/fluxcd/helm-controller:v0.37.4 βœ” kustomize-controller: deployment ready β–Ί ghcr.io/fluxcd/kustomize-controller:v1.2.2 βœ” notification-controller: deployment ready β–Ί ghcr.io/fluxcd/notification-controller:v1.2.4 βœ” source-controller: deployment ready β–Ί ghcr.io/fluxcd/source-controller:v1.2.4 β–Ί checking crds βœ” alerts.notification.toolkit.fluxcd.io/v1beta3 βœ” buckets.source.toolkit.fluxcd.io/v1beta2 βœ” gitrepositories.source.toolkit.fluxcd.io/v1 βœ” helmcharts.source.toolkit.fluxcd.io/v1beta2 βœ” helmreleases.helm.toolkit.fluxcd.io/v2beta2 βœ” helmrepositories.source.toolkit.fluxcd.io/v1beta2 βœ” kustomizations.kustomize.toolkit.fluxcd.io/v1 βœ” ocirepositories.source.toolkit.fluxcd.io/v1beta2 βœ” providers.notification.toolkit.fluxcd.io/v1beta3 βœ” receivers.notification.toolkit.fluxcd.io/v1 βœ” all checks passed

Git provider

No response

Container Registry provider

No response

Additional context

No response

Code of Conduct

Nathan-Nesbitt commented 7 months ago

The only thing I found was I was having similar issues with another helm repo, and the only way I could fix it was to nuke the entire flux bootstrap and restart. Saying that, that's not really practical for anything beyond setup... is there some other way to force reset helm (if that is actually the issue?)

This hasn't worked for this chart, it has been stuck like this for a while now.

98jan commented 7 months ago

Hi,

what I can see is that the HelmRepository manifest is named "rook-ceph", but in the HelmRelease the referenced HelmRepository name ist "rook".

What helps me often is to run following command: kubectl describe HelmRelease rook-ceph-cluster -n rook-ceph

Nathan-Nesbitt commented 7 months ago

@98jan thanks for the reply, I must have changed that last night while playing around, unfortunately it's still stuck with the error.

Nathan-Nesbitt commented 7 months ago

The interesting part to me is that, the error has to do with releases not the chart itself, which to me looks like the release is the issue as it finds v1.13.14 in the helm repository which is not specified.

To me this looks like the release name is somehow wrong/non-existent? Even tho it is specified and should be the same as the name of the HelmRelease from above?

I think this has to do with the application of the values to the releaseName.

What is also interesting is any time I try to force a reconcile it results in the following:

β–Ί annotating HelmRelease rook-ceph-cluster in rook-ceph namespace βœ” HelmRelease annotated β—Ž waiting for HelmRelease reconciliation βœ— context deadline exceeded

Nathan-Nesbitt commented 6 months ago

Yeah there's something very unusual going on, nuked the node and recreated everything and it's now working:

True    Helm install succeeded for release rook-ceph/rook-ceph-cluster.v1 with chart rook-ceph-cluster@v1.13.4

No change to the charts or anything, just started working after I deleted everything...

Is there any general advice on how to debug issues like this? Seems like general flakeyness but it's not clear why this worked or even what the actual original issue was.

mark-oconnor-fidelity commented 3 months ago

I encountered this problem today. The flux "HelmRelease" resource was failing to install.

$ kubectl get hr velero
NAME     AGE     READY   STATUS
velero   2m58s   False   Helm upgrade failed for release velero with chart velero@6.0.0: "velero" has no deployed releases

What I couldn't understand was why the HelmRelease was performing an upgrade action, when the helm ls command reported no releases:

$ helm history velero

Root cause

There had been a previous installation of velero, which had not been properly cleaned up.

$ helm history velero
REVISION        UPDATED                         STATUS          CHART           APP VERSION     DESCRIPTION                              
1               Fri Jun  7 10:13:05 2024        uninstalling    velero-6.6.0  1.13.2          Deletion in progress (or silently failed)

@Nathan-Nesbitt Perhaps this could be your issue? It would explain why nuking your setup fixed the problem.

berlincount commented 3 months ago

try kubectl get secret in the respective namespace where you are managing everything. nuke any sh.helm.release.v1.whatever secret that's no actually applying. it'll try an install instead of an upgrade.

good luck!