fluxcd / flux2

Open and extensible continuous delivery solution for Kubernetes. Powered by GitOps Toolkit.
https://fluxcd.io
Apache License 2.0
6.6k stars 609 forks source link

Bootstrap fails with "context deadline exceeded" #4172

Open comminutus opened 1 year ago

comminutus commented 1 year ago

Describe the bug

After bootstrapping from a Gitea repository with flux bootstrap git, the bootstrap fails with:

◎ waiting for Kustomization "flux-system/flux-system" to be reconciled
✗ Get "https://k3s.local/apis/kustomize.toolkit.fluxcd.io/v1/namespaces/flux-system/kustomizations/flux-system": context deadline exceeded
► confirming components are healthy
✔ helm-controller: deployment ready
✔ kustomize-controller: deployment ready
✔ notification-controller: deployment ready
✔ source-controller: deployment ready
✔ all components are healthy
✗ bootstrap failed with 1 health check failure(s)

This was working fine until I recently added a SealedSecret for my Cert Manager issuer. The initial commit for adding the Issuer and Sealed Secret worked fine within flux. I re-created the k3s cluster with my Terraform deployment because I wanted to disable the k3s helm controller. After my cluster came up, I tried to bootstrap flux and discovered this problem.

From what I can discern, Kustomize is trying to build the flux-system kustomization and for some reason it is looking for the SealedSecrets CRD which isn't installed yet (see additional context section). Shouldn't the dependsOn clause cause Sealed Secrets to be installed first and this error would go away?

For what it's worth, if I remove the cert-manager portion of my repository it deploys without any errors.

Steps to reproduce

  1. Bootstrap a fresh cluster from a git repository that has already defined deployments for Cert Manager, with a SealedSecret, and an Issuer which references the Sealed Secret

Expected behavior

The bootstrap should succeed

Screenshots and recordings

No response

OS / Distro

Fedora CoreOS

Flux version

2.0.1

Flux check

► checking prerequisites ✔ Kubernetes 1.27.4+k3s1 >=1.24.0-0 ► checking controllers ✔ helm-controller: deployment ready ► ghcr.io/fluxcd/helm-controller:v0.35.0 ✔ kustomize-controller: deployment ready ► ghcr.io/fluxcd/kustomize-controller:v1.0.1 ✔ notification-controller: deployment ready ► ghcr.io/fluxcd/notification-controller:v1.0.0 ✔ source-controller: deployment ready ► ghcr.io/fluxcd/source-controller:v1.0.1 ► checking crds ✔ alerts.notification.toolkit.fluxcd.io/v1beta2 ✔ buckets.source.toolkit.fluxcd.io/v1beta2 ✔ gitrepositories.source.toolkit.fluxcd.io/v1 ✔ helmcharts.source.toolkit.fluxcd.io/v1beta2 ✔ helmreleases.helm.toolkit.fluxcd.io/v2beta1 ✔ helmrepositories.source.toolkit.fluxcd.io/v1beta2 ✔ kustomizations.kustomize.toolkit.fluxcd.io/v1 ✔ ocirepositories.source.toolkit.fluxcd.io/v1beta2 ✔ providers.notification.toolkit.fluxcd.io/v1beta2 ✔ receivers.notification.toolkit.fluxcd.io/v1 ✔ all checks passed

Git provider

Gitea (plain git)

Container Registry provider

No response

Additional context

flux get ks -A reveals:

NAMESPACE   NAME        REVISION    SUSPENDED   READY   MESSAGE                                                                                                                                                                                                                   
flux-system flux-system             False       False   SealedSecret/cert-manager/k3s.ca dry-run failed: failed to get API group resources: unable to retrieve the complete list of server APIs: bitnami.com/v1alpha1: the server could not find the requested resource 

flux logs -A --level error reveals:

2023-08-20T16:08:16.946Z error Kustomization/flux-system.flux-system - Reconciliation failed after 948.112357ms, next try in 10m0s SealedSecret/cert-manager/k3s.ca dry-run failed: failed to get API group resources: unable to retrieve the complete list of server APIs: bitnami.com/v1alpha1: the server could not find the requested resource
CustomResourceDefinition/alerts.notification.toolkit.fluxcd.io configured
CustomResourceDefinition/buckets.source.toolkit.fluxcd.io configured
CustomResourceDefinition/gitrepositories.source.toolkit.fluxcd.io configured
CustomResourceDefinition/helmcharts.source.toolkit.fluxcd.io configured
CustomResourceDefinition/helmreleases.helm.toolkit.fluxcd.io configured
CustomResourceDefinition/helmrepositories.source.toolkit.fluxcd.io configured
CustomResourceDefinition/kustomizations.kustomize.toolkit.fluxcd.io configured
CustomResourceDefinition/ocirepositories.source.toolkit.fluxcd.io configured
CustomResourceDefinition/providers.notification.toolkit.fluxcd.io configured
CustomResourceDefinition/receivers.notification.toolkit.fluxcd.io configured
Namespace/cert-manager created
Namespace/flux-system configured

and...

2023-08-20T16:18:13.626Z error Kustomization/flux-system.flux-system - Reconciliation failed after 468.302081ms, next try in 10m0s SealedSecret/cert-manager/k3s.ca dry-run failed: failed to get API group resources: unable to retrieve the complete list of server APIs: bitnami.com/v1alpha1: the server could not find the requested resource

Here is the Sealed Secret:

apiVersion: bitnami.com/v1alpha1
kind: SealedSecret
metadata:
  creationTimestamp: null
  name: k3s.ca
  namespace: cert-manager
spec:
  encryptedData: [redacted]
    tls.key: [redacted]
  template:
    metadata:
      creationTimestamp: null
      name: k3s.ca
      namespace: cert-manager
    type: kubernetes.io/tls

Here is the kustomization that contains all of the cert-manager resources (Issuer, Sealed Secret, Namespace, HelmRelease, and HelmRepository):

apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
  name: cert-manager
  namespace: flux-system
spec:
  interval: 1h
  sourceRef:
    kind: GitRepository
    name: flux-system
  dependsOn:
  - name: kube-prometheus-stack
  - name: sealed-secrets
  path: ./cluster/infrastructure/cert-manager
  prune: true
  wait: true

Here is the Sealed Secrets kustomization:

apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
  name: sealed-secrets
  namespace: flux-system
spec:
  sourceRef:
    kind: GitRepository
    name: flux-system
  path: ./cluster/infrastructure/sealed-secrets
  interval: 1h
  prune: true
  wait: true

kubectl get pods -A:

NAMESPACE     NAME                                      READY   STATUS    RESTARTS   AGE
flux-system   helm-controller-74b5f87d94-ppwtp          1/1     Running   0          70m
flux-system   kustomize-controller-59d4cb8bc6-zjkpp     1/1     Running   0          70m
flux-system   notification-controller-b7d8566b7-8mkk9   1/1     Running   0          70m
flux-system   source-controller-645656595b-nldph        1/1     Running   0          70m
kube-system   coredns-77ccd57875-cjw6w                  1/1     Running   0          71m
kube-system   local-path-provisioner-957fdf8bc-p65rs    1/1     Running   0          71m
kube-system   metrics-server-648b5df564-v2h2t           1/1     Running   0          71m

kubectl get events -n flux-system --field-selector type=Warning:

LAST SEEN   TYPE      REASON                 OBJECT                      MESSAGE
39s         Warning   ReconciliationFailed   kustomization/flux-system   SealedSecret/cert-manager/k3s.ca dry-run failed: failed to get API group resources: unable to retrieve the complete list of server APIs: bitnami.com/v1alpha1: the server could not find the requested resource

kubectl get gitrepositories.source.toolkit.fluxcd.io -A:

NAMESPACE     NAME          URL                                        AGE   READY   STATUS
flux-system   flux-system   ssh://git@k3s.local/k3s/k3s   72m   True    stored artifact for revision 'master@sha1:3d4fdf8d3ce535f59d6b5fce7fd51514b80e4d52'

flux get sources all -A:

NAMESPACE   NAME                        REVISION                SUSPENDED   READY   MESSAGE                                             
flux-system gitrepository/flux-system   master@sha1:3d4fdf8d    False       True    stored artifact for revision 'master@sha1:3d4fdf8d'

Code of Conduct

comminutus commented 1 year ago

Ok I found the issue - I forgot SealedSecrets are tied to a specific deployment and after I recreated the cluster the new cluster didn't have the old SealedSecrets keys.

comminutus commented 1 year ago

Re-opening this since I don't think it's related to the sealed secret resource not being present. To mitigate the latter I save the sealed-secrets keys and re-apply them prior to bootstrapping flux (per https://github.com/bitnami-labs/sealed-secrets#how-can-i-do-a-backup-of-my-sealedsecrets). I also changed my dependencies a bit to be sure they were correct and then re-created the cluster by separating the config from the controller as suggested by https://github.com/fluxcd/flux2/issues/1980#issuecomment-949307972 . This seems related to #1980 .

Here is the cert-manager-config kustomization:

apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
  name: cert-manager-config
  namespace: flux-system
spec:
  interval: 1h
  retryInterval: 1m
  sourceRef:
    kind: GitRepository
    name: flux-system
  dependsOn:
  - name: cert-manager-controller
    namespace: flux-system
  - name: sealed-secrets
    namespace: flux-system
  path: ./cluster/infrastructure/cert-manager/config
  prune: true
  wait: true

Here is the cert-manager-controller kustomization:

apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
  name: cert-manager-controller
  namespace: flux-system
spec:
  sourceRef:
    kind: GitRepository
    name: flux-system
  path: ./cluster/infrastructure/cert-manager/controller
  dependsOn:
  - name: kube-prometheus-stack
  interval: 1h
  retryInterval: 1m
  prune: true
  wait: true

If I delete the cert-manager-config kustomization, commit, push my changes, the flux-system kustomization eventualy becomes ready. If I add back the cert-manager-config kustomization after that, it becomes ready as well.

Therefore, it seems like the flux-system kustomization isn't respecting the dependencies described. The only work-around I can think of is to delete all kustomizations that have sealed secrets from the repository, push changes, and re-apply them. This seems pretty painful whenever anyone will have to rebuild their cluster.

makkes commented 1 year ago

The issue likely is that you're making Flux try to create a SealedSecret while the CRD isn't there, yet. How are you deploying the SealedSecret CRDs? As part of a HelmRelease? If that's so, you need to create a Kustomization for that HelmRelease and make your other Kustomization (the one that creates SealedSecret resources) depend on that.

comminutus commented 1 year ago

@makkes Thanks for your reply. I do already have a Flux Kustomization for Sealed Secrets:

apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
  name: sealed-secrets
  namespace: flux-system
spec:
  sourceRef:
    kind: GitRepository
    name: flux-system
  path: ./cluster/infrastructure/sealed-secrets
  interval: 1h
  retryInterval: 1m
  prune: true
  wait: true

./cluster/infrastructure/sealed-secrets/repository.yaml:

apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: HelmRepository
metadata:
  name: sealed-secrets
  namespace: flux-system
spec:
  url: https://bitnami-labs.github.io/sealed-secrets 
  interval: 24h

./cluster/infrastructure/sealed-secrets/release.yaml:

apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
  name: sealed-secrets
  namespace: flux-system
spec:
  chart:
    spec:
      chart: sealed-secrets
      reconcileStrategy: ChartVersion
      sourceRef:
        kind: HelmRepository
        name: sealed-secrets
  targetNamespace: kube-system
  interval: 24h
  values:
    fullnameOverride: sealed-secrets-controller

The cert-manager-config kustomization depends on the sealed-secrets kustomization already.

I think I might have found the problem though. In ./cluster, where I bootstrap flux, I have one file at the moment: infrastructure.yaml:

apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
  name: infrastrucure
  namespace: flux-system
spec:
  sourceRef:
    kind: GitRepository
    name: flux-system
  path: ./cluster/infrastructure
  interval: 1h
  retryInterval: 1m
  prune: true
  wait: true

I think the problem with this is since there is no kustomization.yaml file defined here, the kustomize controller will automatically generate one and include all of my yaml files inside infrastructure as one Flux kustomization.

I added a kustomization.yaml file at ./cluster/kustomization.yaml:

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
  - infrastructure.yaml

This makes the kustomize build only consider infrastructure.yaml as part of the flux-system kustomization, which makes the build succeed and the rest of the dependencies are built correctly.

I think this gotchya should be more clearly documented in the getting started guides perhaps. The only thing that led me to think about this problem was this small blurb buried in the FAQ: https://fluxcd.io/flux/faq/#can-i-use-repositories-with-plain-yamls .

I started off by using the example repository on how to structure my own repository but I wanted to vary it slightly because I have other source at the root, and want to define all of my cluster resources (including flux) at ./cluster .

The problem is resolved by adding that kustomization.yaml file as I described above. Should I close this or keep it open as a documentation issue?

stefanprodan commented 1 year ago

Please see the recommend repo structure here: https://github.com/fluxcd/flux2-kustomize-helm-example#infrastructure

As explained in the docs, you should have at least two Flux Kustomizations, one for controllers (CRDs) and one for configs (CRs) with a dependsOn relationship.

comminutus commented 1 year ago

@stefanprodan I do indeed have that (one named cert-manager-controller and another named cert-manager-configs). The problem still manifested regardless, because of what I mentioned about the kustomization.yaml file at ./cluster . In the example repository, the infrastructure directory is separated from the cluster directory. I think if the infrastructure directory were to move underneath the cluster directory the same problem would manifest.

stefanprodan commented 1 year ago

In the clusters/my-cluster dir you should only have Flux Kustomizations without a kustomization.yaml. In the Flux Kustomizations you should refer overlays from the top root level. This is the the structure we recommend to users but you are free to do whatever.

comminutus commented 1 year ago

@stefanprodan I realize that's what works for this kind of structure. However I thought one of the benefits to flux was that the developers are free to create a more loose structure. Instead of placing the cluster dir as a separate directory, my cluster dir is at the Flux repository root (not the actual root, since I'm using --path ./cluster with flux bootstrap). If one wants to have this kind of structure, then a kustomization.yaml file in the cluster dir seems like a requirement. It was pretty tedious to track this problem down since the way flux looks at directories without a kustomization.yaml file is buried in the FAQ.

I'm happy since I got it to work, however I wonder if others might benefit from something in the getting started guide about how flux looks at the initial directory it's bootstrapped with. If there are any yaml files in other directories in the tree of the directory that was bootstrapped (other than flux-system), they will get included into the flux-system kustomization which is probably not what the end-user wants.