fluxcd / helm-controller

The GitOps Toolkit Helm reconciler, for declarative Helming
https://fluxcd.io
Apache License 2.0
410 stars 163 forks source link

Ready status when single pod can't start #81

Closed phillebaba closed 3 weeks ago

phillebaba commented 4 years ago

I have found some weird behavior when testing the status behavior of HelmRelease.

The following setup should deploy the Helm charts podinfo and redis, both of which should fail as the tag foo does not exist for any of the images.

apiVersion: source.toolkit.fluxcd.io/v1alpha1
kind: HelmRepository
metadata:
  name: podinfo
  namespace: gitops-system
spec:
  url: https://stefanprodan.github.io/podinfo
  interval: 10m
---
apiVersion: helm.toolkit.fluxcd.io/v2alpha1
kind: HelmRelease
metadata:
  name: frontend
  namespace: gitops-system
spec:
  targetNamespace: webapp
  interval: 5m
  chart:
    spec:
      chart: podinfo
      version: '>=4.0.0 <5.0.0'
      sourceRef:
        kind: HelmRepository
        name: podinfo
      interval: 1m
  values:
    image:
      tag: foo
---
apiVersion: source.toolkit.fluxcd.io/v1alpha1
kind: HelmRepository
metadata:
  name: stable
  namespace: gitops-system
spec:
  url: https://kubernetes-charts.storage.googleapis.com/
  interval: 10m
---
apiVersion: helm.toolkit.fluxcd.io/v2alpha1
kind: HelmRelease
metadata:
  name: redis
  namespace: gitops-system
spec:
  targetNamespace: webapp
  interval: 5m
  chart:
    spec:
      chart: redis
      sourceRef:
        kind: HelmRepository
        name: stable
      interval: 1m
  values:
    image:
      tag: foo

Both result in pods in a ImagePullBackOff state.

NAME                                       READY   STATUS             RESTARTS   AGE
webapp-frontend-podinfo-6694fbcbc4-rvjcn   0/1     ImagePullBackOff   0          6m32s
webapp-redis-master-0                      0/1     ImagePullBackOff   0          4m59s
webapp-redis-slave-0                       0/1     ImagePullBackOff   0          4m59s

Yet the podinfo HelmRelease ends up in a ready state which redis does not.

NAME       READY   STATUS                                                     AGE
frontend   True    release reconciliation succeeded                           7m15s
redis      False   Helm install failed: timed out waiting for the condition   5m45s

I would expect both HelmReleases to not be in a ready state.

hiddeco commented 4 years ago

This is likely due to Helm's own behaviour for the --wait flag, and a Deployment only having a single replica. See: https://github.com/helm/helm/issues/5814#issuecomment-567130226

seaneagan commented 4 years ago

Proposed fix here:

https://github.com/helm/helm/pull/8671

Note: you may be able to work around it by setting maxUnavailable differently (or unsetting it).

phillebaba commented 4 years ago

Is it worth fixing before we get a new release of Helm with this fix? Health checks that now use kstatus are dependent on the status being properly set.

https://github.com/fluxcd/kustomize-controller/pull/101

stefanprodan commented 4 years ago

@phillebaba there is no fix for this that we can do in fluxcd, this needs to be fixed upstream. We should document the Helm bug in our docs.

stefanprodan commented 3 weeks ago

Helm rejected the fix in https://github.com/helm/helm/pull/10831 nothing we can do about it.