fluxcd / helm-controller

The GitOps Toolkit Helm reconciler, for declarative Helming
https://fluxcd.io
Apache License 2.0
398 stars 157 forks source link

Unable to detect server capabilities #936

Closed b4nst closed 3 months ago

b4nst commented 3 months ago

When trying to deploy Loki chart we faced an issue about server capabilities detection.

The relevant part of the chart is https://github.com/grafana/loki/blob/0084262269f4e2cb94d04e0cc0d40e9666177f06/production/helm/loki/templates/read/hpa.yaml#L2-L8

Trying to deploy with autoscaling enabled, we face:

no matches for kind "HorizontalPodAutoscaler" in version "autoscaling/v2beta1" ensure CRDs are installed first

However our control plane is returning the correct v2 version for autoscaling:

kubectl get --raw /apis/autoscaling
{
  "kind": "APIGroup",
  "apiVersion": "v1",
  "name": "autoscaling",
  "versions": [
    {
      "groupVersion": "autoscaling/v2",
      "version": "v2"
    },
    {
      "groupVersion": "autoscaling/v1",
      "version": "v1"
    }
  ],
  "preferredVersion": {
    "groupVersion": "autoscaling/v2",
    "version": "v2"
  }
}

We're using flux v2.2.3


For reference, this as also been discussed in https://github.com/fluxcd/flux2/discussions/4234

stefanprodan commented 3 months ago

Can you please try ghcr.io/fluxcd/helm-controller:preview-65b54580 this preview image is using the latest Helm SDK v3.14.3

hiddeco commented 3 months ago

Is this an installation or an upgrade? In case of the latter, you may be running into something which typically requires https://github.com/helm/helm-mapkubeapis

b4nst commented 3 months ago

That's a new installation, I'll give a shoot to the preview image. Thanks for the reactivity!

b4nst commented 3 months ago

Still facing the same issue with the preview-65b54580. I tried removing and recreating the chart from scratch just in case, but no joy

stefanprodan commented 3 months ago

Are you using impersonation in your HelmRelease, do you have .spec.serviceAccountName or .spec.kubeConfig?

souleb commented 3 months ago

can you reproduce this with helm install? The capabilities check is performed by the same code in thehelm sdk. Can you see anything in the log?

b4nst commented 3 months ago

Sorry for the delay, I don't impersonate (no .serviceAccountName or .kubeConfig in the HelmRelease specs). And I cannot reproduce that with manual helm install. The manual helm install correctly creates an autoscaling/v2 HPA.

stefanprodan commented 3 months ago

@b4nst are using the same server with the same values for the Helm CLI as helm-controller? In Flux we call the same install function as the CLI, the capabilities are collected by Helm SDK itself: https://github.com/helm/helm/blob/14d0c13e9eefff5b4a1b511cf50643529692ec94/pkg/action/install.go#L286

b4nst commented 3 months ago

Yeah same values, same cluster. I'll check the helm-controller install, maybe it lacks permissions?

stefanprodan commented 3 months ago

Can you post here your HelmRepo/HelmRelease YAML so I can try on a fresh cluster please

b4nst commented 3 months ago

Sure, I removed some sensitive info but that shouldn't impact the test.

---
apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: HelmRepository
metadata:
  name: grafana
  namespace: flux-system
spec:
  interval: 1h
  url: https://grafana.github.io/helm-charts
---
apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
  name: loki
  namespace: default
spec:
  interval: 5m
  chart:
    spec:
      chart: loki
      version: "3.0.0"
      sourceRef:
        kind: HelmRepository
        name: grafana
        namespace: flux-system
      interval: 1m
  values:
    loki:
      storage:
        type: gcs
        bucketNames:
          chunks: foo-loki-chunks
          ruler: foo-loki-ruler
          admin: foo-loki-admin
      schema_config:
        configs:
          - from: 2020-01-01
            store: tsdb
            object_store: gcs
            schema: v13
            index:
              prefix: index_
              period: 24h
    loki_canary:
      enabled: false
    gateway:
      enabled: true
      autoscaling:
        enabled: true
    monitoring:
      selfMonitoring:
        enabled: false
        grafanaAgent:
          installOperator: false
stefanprodan commented 3 months ago

I'm getting the same error on Kubernetes 1.29.3

Helm install failed for release loki/loki with chart loki@3.0.0: unable to build kubernetes objects from release manifest: resource mapping not found for name: "loki-gateway" namespace: "" from "": no matches for kind "HorizontalPodAutoscaler" in version "autoscaling/v2beta1"    
                                        ensure CRDs are installed first

What's really strange is that Helm capabilities work for other charts. I tried Flagger https://github.com/fluxcd/flagger/blob/main/charts/flagger/templates/pdb.yaml#L2

{{- if .Values.podDisruptionBudget.enabled }}
{{- if .Capabilities.APIVersions.Has "policy/v1/PodDisruptionBudget" -}}
apiVersion: policy/v1
{{- else }}
apiVersion: policy/v1beta1
{{- end }}
stefanprodan commented 3 months ago

Ok so you are using Loki chart 3.0.0 which is very old, that one doesn't have autoscaling/v2. Using latest 6.2.0 works:

NAME           REFERENCE                 TARGETS         MINPODS   MAXPODS   REPLICAS   AGE
loki-gateway   Deployment/loki-gateway   <unknown>/60%   1         3         0          6s

@b4nst did you by any chance used latest for Helm CLI while in Flux you've set 3.0.0?

stefanprodan commented 3 months ago

Ah you looked at the Loki version instead of the Chart version from here https://artifacthub.io/packages/helm/grafana/loki

Note that Helm has two versions, chart version (what Flux needs under chart.version) and appVersion which is just informative, the app version is 3.0.0 while the chart is 6.2.0

b4nst commented 3 months ago

Ah waw, indeed I confused appVersion and chart version 🤦 . Thanks for being more thorough-full than me! Indeed locally I just used latest one, did not precise the chart version. Sorry for all that noise, the one I was interested is indeed 6.2.0

stefanprodan commented 3 months ago

If you've posted the HelmRelease YAML from start would've saved lots of time :D