fluxcd / flux2

Open and extensible continuous delivery solution for Kubernetes. Powered by GitOps Toolkit.
https://fluxcd.io
Apache License 2.0
6.46k stars 597 forks source link

Bootstrapping new cluster fails on k3s v1.20 #1344

Closed argamanza closed 2 years ago

argamanza commented 3 years ago

I have a k3s cluster working on a Raspberry Pi connected to my home local network. Tried to bootstrap a new GOTK repo using the following command:

flux bootstrap github \
--owner=$GITHUB_USER \
--repository=$CONFIG_REPO \
--branch=master \
--path=./clusters/my-cluster \
--personal \
--kubeconfig=/etc/rancher/k3s/k3s.yaml

The output for the bootstrapping command (notice the "context deadline exceeded" after "waiting for Kustomization "flux-system/flux-system" to be reconciled"):

► connecting to github.com
► cloning branch "master" from Git repository "https://github.com/argamanza/raspberry-pi-flux-config.git"
✔ cloned repository
► generating component manifests
✔ generated component manifests
✔ component manifests are up to date
► installing toolkit.fluxcd.io CRDs
◎ waiting for CRDs to be reconciled
✔ CRDs reconciled successfully
► installing components in "flux-system" namespace
✔ installed components
✔ reconciled components
► determining if source secret "flux-system/flux-system" exists
✔ source secret up to date
► generating sync manifests
✔ generated sync manifests
✔ sync manifests are up to date
► applying sync manifests
✔ reconciled sync configuration
◎ waiting for Kustomization "flux-system/flux-system" to be reconciled
✗ context deadline exceeded
► confirming components are healthy
✔ source-controller: deployment ready
✔ kustomize-controller: deployment ready
✔ helm-controller: deployment ready
✔ notification-controller: deployment ready
✔ all components are healthy
✗ bootstrap failed with 1 health check failure(s)

The logs for the Kustomize Controller expose what the issue might be:

{"level":"info","ts":"2021-04-24T20:55:51.200Z","logger":"controller-runtime.metrics","msg":"metrics server is starting to listen","addr":":8080"}
{"level":"info","ts":"2021-04-24T20:55:51.202Z","logger":"setup","msg":"starting manager"}
I0424 20:55:51.206769       7 leaderelection.go:243] attempting to acquire leader lease flux-system/kustomize-controller-leader-election...
{"level":"info","ts":"2021-04-24T20:55:51.307Z","msg":"starting metrics server","path":"/metrics"}
I0424 20:56:30.436269       7 leaderelection.go:253] successfully acquired lease flux-system/kustomize-controller-leader-election
{"level":"info","ts":"2021-04-24T20:56:30.436Z","logger":"controller.kustomization","msg":"Starting EventSource","reconciler group":"kustomize.toolkit.fluxcd.io","reconciler kind":"Kustomization","source":"kind source: /, Kind="}
{"level":"info","ts":"2021-04-24T20:56:30.437Z","logger":"controller.kustomization","msg":"Starting EventSource","reconciler group":"kustomize.toolkit.fluxcd.io","reconciler kind":"Kustomization","source":"kind source: /, Kind="}
{"level":"info","ts":"2021-04-24T20:56:30.538Z","logger":"controller.kustomization","msg":"Starting EventSource","reconciler group":"kustomize.toolkit.fluxcd.io","reconciler kind":"Kustomization","source":"kind source: /, Kind="}
{"level":"info","ts":"2021-04-24T20:56:30.639Z","logger":"controller.kustomization","msg":"Starting Controller","reconciler group":"kustomize.toolkit.fluxcd.io","reconciler kind":"Kustomization"}
{"level":"info","ts":"2021-04-24T20:56:30.639Z","logger":"controller.kustomization","msg":"Starting workers","reconciler group":"kustomize.toolkit.fluxcd.io","reconciler kind":"Kustomization","worker count":4}
{"level":"info","ts":"2021-04-24T20:56:47.576Z","logger":"controller.kustomization","msg":"Kustomization applied in 2.713132582s","reconciler group":"kustomize.toolkit.fluxcd.io","reconciler kind":"Kustomization","name":"flux-system","namespace":"flux-system","output":{"clusterrole.rbac.authorization.k8s.io/crd-controller-flux-system":"unchanged","clusterrolebinding.rbac.authorization.k8s.io/cluster-reconciler-flux-system":"unchanged","clusterrolebinding.rbac.authorization.k8s.io/crd-controller-flux-system":"unchanged","customresourcedefinition.apiextensions.k8s.io/alerts.notification.toolkit.fluxcd.io":"configured","customresourcedefinition.apiextensions.k8s.io/buckets.source.toolkit.fluxcd.io":"configured","customresourcedefinition.apiextensions.k8s.io/gitrepositories.source.toolkit.fluxcd.io":"configured","customresourcedefinition.apiextensions.k8s.io/helmcharts.source.toolkit.fluxcd.io":"configured","customresourcedefinition.apiextensions.k8s.io/helmreleases.helm.toolkit.fluxcd.io":"configured","customresourcedefinition.apiextensions.k8s.io/helmrepositories.source.toolkit.fluxcd.io":"configured","customresourcedefinition.apiextensions.k8s.io/kustomizations.kustomize.toolkit.fluxcd.io":"configured","customresourcedefinition.apiextensions.k8s.io/providers.notification.toolkit.fluxcd.io":"configured","customresourcedefinition.apiextensions.k8s.io/receivers.notification.toolkit.fluxcd.io":"configured","deployment.apps/helm-controller":"configured","deployment.apps/kustomize-controller":"configured","deployment.apps/notification-controller":"configured","deployment.apps/source-controller":"configured","gitrepository.source.toolkit.fluxcd.io/flux-system":"unchanged","kustomization.kustomize.toolkit.fluxcd.io/flux-system":"unchanged","namespace/flux-system":"unchanged","networkpolicy.networking.k8s.io/allow-egress":"unchanged","networkpolicy.networking.k8s.io/allow-scraping":"unchanged","networkpolicy.networking.k8s.io/allow-webhooks":"unchanged","service/notification-controller":"unchanged","service/source-controller":"unchanged","service/webhook-receiver":"unchanged","serviceaccount/helm-controller":"unchanged","serviceaccount/kustomize-controller":"unchanged","serviceaccount/notification-controller":"unchanged","serviceaccount/source-controller":"unchanged"}}
{"level":"error","ts":"2021-04-24T20:56:47.609Z","logger":"controller.kustomization","msg":"unable to update status after reconciliation","reconciler group":"kustomize.toolkit.fluxcd.io","reconciler kind":"Kustomization","name":"flux-system","namespace":"flux-system","error":"Kustomization.kustomize.toolkit.fluxcd.io \"flux-system\" is invalid: status.snapshot.entries.namespace: Invalid value: \"null\": status.snapshot.entries.namespace in body must be of type string: \"null\""}
{"level":"error","ts":"2021-04-24T20:56:47.609Z","logger":"controller.kustomization","msg":"Reconciler error","reconciler group":"kustomize.toolkit.fluxcd.io","reconciler kind":"Kustomization","name":"flux-system","namespace":"flux-system","error":"Kustomization.kustomize.toolkit.fluxcd.io \"flux-system\" is invalid: status.snapshot.entries.namespace: Invalid value: \"null\": status.snapshot.entries.namespace in body must be of type string: \"null\""}
{"level":"info","ts":"2021-04-24T20:56:53.835Z","logger":"controller.kustomization","msg":"Kustomization applied in 2.470822475s","reconciler group":"kustomize.toolkit.fluxcd.io","reconciler kind":"Kustomization","name":"flux-system","namespace":"flux-system","output":{"clusterrole.rbac.authorization.k8s.io/crd-controller-flux-system":"unchanged","clusterrolebinding.rbac.authorization.k8s.io/cluster-reconciler-flux-system":"unchanged","clusterrolebinding.rbac.authorization.k8s.io/crd-controller-flux-system":"unchanged","customresourcedefinition.apiextensions.k8s.io/alerts.notification.toolkit.fluxcd.io":"configured","customresourcedefinition.apiextensions.k8s.io/buckets.source.toolkit.fluxcd.io":"configured","customresourcedefinition.apiextensions.k8s.io/gitrepositories.source.toolkit.fluxcd.io":"configured","customresourcedefinition.apiextensions.k8s.io/helmcharts.source.toolkit.fluxcd.io":"configured","customresourcedefinition.apiextensions.k8s.io/helmreleases.helm.toolkit.fluxcd.io":"configured","customresourcedefinition.apiextensions.k8s.io/helmrepositories.source.toolkit.fluxcd.io":"configured","customresourcedefinition.apiextensions.k8s.io/kustomizations.kustomize.toolkit.fluxcd.io":"configured","customresourcedefinition.apiextensions.k8s.io/providers.notification.toolkit.fluxcd.io":"configured","customresourcedefinition.apiextensions.k8s.io/receivers.notification.toolkit.fluxcd.io":"configured","deployment.apps/helm-controller":"configured","deployment.apps/kustomize-controller":"configured","deployment.apps/notification-controller":"configured","deployment.apps/source-controller":"configured","gitrepository.source.toolkit.fluxcd.io/flux-system":"unchanged","kustomization.kustomize.toolkit.fluxcd.io/flux-system":"unchanged","namespace/flux-system":"unchanged","networkpolicy.networking.k8s.io/allow-egress":"unchanged","networkpolicy.networking.k8s.io/allow-scraping":"unchanged","networkpolicy.networking.k8s.io/allow-webhooks":"unchanged","service/notification-controller":"unchanged","service/source-controller":"unchanged","service/webhook-receiver":"unchanged","serviceaccount/helm-controller":"unchanged","serviceaccount/kustomize-controller":"unchanged","serviceaccount/notification-controller":"unchanged","serviceaccount/source-controller":"unchanged"}}
{"level":"error","ts":"2021-04-24T20:56:53.863Z","logger":"controller.kustomization","msg":"unable to update status after reconciliation","reconciler group":"kustomize.toolkit.fluxcd.io","reconciler kind":"Kustomization","name":"flux-system","namespace":"flux-system","error":"Kustomization.kustomize.toolkit.fluxcd.io \"flux-system\" is invalid: status.snapshot.entries.namespace: Invalid value: \"null\": status.snapshot.entries.namespace in body must be of type string: \"null\""}

From the logs I can tell that status.snapshot.entries.namespace shouldn't be null for the flux-system kustomization, and after testing the same bootstrap procedure on a local machine using cluster I provisioned using kind I can see that the kustomization indeed miss the status.snapshot data in the K3S cluster while on my local kind cluster it exists:

K3S@RaspberryPi:

kubectl describe kustomization flux-system -n flux-system

Name:         flux-system
Namespace:    flux-system
Labels:       kustomize.toolkit.fluxcd.io/checksum=1d4c5beef02b0043768a476cc3fed578aa3ed6f0
              kustomize.toolkit.fluxcd.io/name=flux-system
              kustomize.toolkit.fluxcd.io/namespace=flux-system
Annotations:  <none>
API Version:  kustomize.toolkit.fluxcd.io/v1beta1
Kind:         Kustomization
Metadata:
  Creation Timestamp:  2021-04-24T19:42:50Z
  Finalizers:
    finalizers.fluxcd.io
  Generation:  1
...
...
Status:
  Conditions:
    Last Transition Time:  2021-04-24T19:43:30Z
    Message:               reconciliation in progress
    Reason:                Progressing
    Status:                Unknown
    Type:                  Ready
Events:
  Type    Reason  Age   From                  Message
  ----    ------  ----  ----                  -------
  Normal  info    57m   kustomize-controller  customresourcedefinition.apiextensions.k8s.io/buckets.source.toolkit.fluxcd.io configured
...

kind@local:

kubectl describe kustomization flux-system -n flux-system

Name:         flux-system
Namespace:    flux-system
Labels:       kustomize.toolkit.fluxcd.io/checksum=1d4c5beef02b0043768a476cc3fed578aa3ed6f0
              kustomize.toolkit.fluxcd.io/name=flux-system
              kustomize.toolkit.fluxcd.io/namespace=flux-system
Annotations:  <none>
API Version:  kustomize.toolkit.fluxcd.io/v1beta1
Kind:         Kustomization
Metadata:
  Creation Timestamp:  2021-04-25T12:35:37Z
  Finalizers:
    finalizers.fluxcd.io
  Generation:  1
...
...
Status:
  Conditions:
    Last Transition Time:   2021-04-25T12:37:02Z
    Message:                Applied revision: master/dbce13415e4118bb071b58dab20d1f2bec527a14
    Reason:                 ReconciliationSucceeded
    Status:                 True
    Type:                   Ready
  Last Applied Revision:    master/dbce13415e4118bb071b58dab20d1f2bec527a14
  Last Attempted Revision:  master/dbce13415e4118bb071b58dab20d1f2bec527a14
  Observed Generation:      1
  Snapshot:
    Checksum:  1d4c5beef02b0043768a476cc3fed578aa3ed6f0
    Entries:
      Kinds:
        /v1, Kind=Namespace:                                     Namespace
        apiextensions.k8s.io/v1, Kind=CustomResourceDefinition:  CustomResourceDefinition
        rbac.authorization.k8s.io/v1, Kind=ClusterRole:          ClusterRole
        rbac.authorization.k8s.io/v1, Kind=ClusterRoleBinding:   ClusterRoleBinding
      Namespace:
      Kinds:
        /v1, Kind=Service:                                        Service
        /v1, Kind=ServiceAccount:                                 ServiceAccount
        apps/v1, Kind=Deployment:                                 Deployment
        kustomize.toolkit.fluxcd.io/v1beta1, Kind=Kustomization:  Kustomization
        networking.k8s.io/v1, Kind=NetworkPolicy:                 NetworkPolicy
        source.toolkit.fluxcd.io/v1beta1, Kind=GitRepository:     GitRepository
      Namespace:                                                  flux-system
Events:
  Type    Reason  Age    From                  Message
  ----    ------  ----   ----                  -------
  Normal  info    3m53s  kustomize-controller  customresourcedefinition.apiextensions.k8s.io/buckets.source.toolkit.fluxcd.io configured
...

This is also where my debugging process came to a dead end as I couldn't find a reason why the status.snapshot doesn't populate on my K3S@RaspberryPi while it does on Kind@Local using the same bootstrap process.

I believe the fact that the issue only occurs on my raspberry pi implies that it might be a networking issue of some kind that prevents the kustomize controller from getting status updates from GitHub and I need to handle port forwarding or something similar, but I'm not sure.

flux --version
flux version 0.13.1

flux check
► checking prerequisites
✔ kubectl 1.20.6+k3s1 >=1.18.0-0
✔ Kubernetes 1.20.6+k3s1 >=1.16.0-0
► checking controllers
✔ helm-controller: deployment ready
► ghcr.io/fluxcd/helm-controller:v0.10.0
✔ kustomize-controller: deployment ready
► ghcr.io/fluxcd/kustomize-controller:v0.11.1
✔ notification-controller: deployment ready
► ghcr.io/fluxcd/notification-controller:v0.13.0
✔ source-controller: deployment ready
► ghcr.io/fluxcd/source-controller:v0.12.1
✔ all checks passed
stefanprodan commented 3 years ago

Duplicate of: https://github.com/fluxcd/kustomize-controller/issues/320

stefanprodan commented 3 years ago

This is a bug in k3s, hopefully it will be fixed once k3s catches up with Kubernetes 1.21

Jenishk56 commented 3 years ago

brew upgrade flux resolved it for me. I guess it has been resolved with the version 0.13.2 (as per brew installations). Not sure whether 2 different kind of versioning out there for flux.

I used brew install fluxcd/tap/flux for the installations in mac.

jamessewell commented 3 years ago

Is there a workaround _- I'm seeing it on 1.19.5+k3s2 but I can't upgrade due to other components not supporting higher verisons

jamessewell commented 3 years ago

Upgraded to 1.19.10+k3s1 (as far as I can go) and still not working - odd as people above seem to have had luck?

Scratch that - k3os with k3s 1.1.19.10 works :)

jpconstantineau commented 3 years ago

I am seeing a similar issue bootstrapping flux on a k3s 1.21.0 cluster running on raspberry PIs running Ubuntu Server 20 (and 21) 64bits.

What I get is gitrepository/flux-system is waiting to be reconciled. I bootstrapped from a machine remote from the cluster and github successfully added the ssh key. Unfortunately, it doesn't seem that the key got used by the cluster to be able to read the repo. Is this an indication that the cluster cannot get to github? I can ping from the nodes but I suspect it's internal networking to the cluster that cannot reach out... Any idea on resolving this?

stefanprodan commented 3 years ago

it doesn't seem that the key got used by the cluster to be able to read the repo

Can you be more explicit? What’s the status of you sources? Please post here:

jpconstantineau commented 3 years ago
flux get all
NAME                            READY   MESSAGE                         REVISION        SUSPENDED 
gitrepository/flux-system       False   waiting to be reconciled                        False    

NAME                            READY   MESSAGE                                 REVISION        SUSPENDED 
kustomization/flux-system       False   Source is not ready, artifact not found                 False 

Logs at level = error did not return anything. This did:

flux logs 
2021-06-19T04:53:33.301Z info Kustomization/flux-system.flux-system - Source is not ready, artifact not found 
2021-06-19T05:03:33.341Z info Kustomization/flux-system.flux-system - Source is not ready, artifact not found 
2021-06-19T05:13:33.376Z info Kustomization/flux-system.flux-system - Source is not ready, artifact not found 
2021-06-19T05:23:33.421Z info Kustomization/flux-system.flux-system - Source is not ready, artifact not found 
2021-06-19T05:33:33.448Z info Kustomization/flux-system.flux-system - Source is not ready, artifact not found 
2021-06-19T05:43:33.497Z info Kustomization/flux-system.flux-system - Source is not ready, artifact not found 

Doing a check gives me this:

flux check
► checking prerequisites
✗ flux 0.15.0 <0.15.2 (new version is available, please upgrade)
✔ kubectl 1.21.1 >=1.18.0-0
✔ Kubernetes 1.21.1+k3s1 >=1.16.0-0
► checking controllers
✗ source-controller: deployment not ready
► ghcr.io/fluxcd/source-controller:v0.14.0
✔ kustomize-controller: deployment ready
► ghcr.io/fluxcd/kustomize-controller:v0.13.0
✔ notification-controller: deployment ready
► ghcr.io/fluxcd/notification-controller:v0.15.0
✔ helm-controller: deployment ready
► ghcr.io/fluxcd/helm-controller:v0.11.0
stefanprodan commented 3 years ago

source-controller: deployment not ready

The controller that does Git operations is crashing on your cluster, I guess you’re using ARM64, you need to upgrade to flux 0.15.2 to fix the crash loop.

jpconstantineau commented 3 years ago

Updating did resolve the crashloop. I do get networking errors:

flux logs --level=error
2021-06-19T05:54:13.586Z error GitRepository/flux-system.flux-system - unable to send event POST http://notification-controller/ giving up after 5 attempt(s): Post &#34;http://notification-controller/&#34;: context deadline exceeded (Client.Timeout exceeded while awaiting headers)
2021-06-19T05:54:13.587Z error GitRepository/flux-system.flux-system - Reconciler error unable to clone &#39;ssh://git@github.com/jpconstantineau/flux-homelab&#39;, error: dial tcp: lookup github.com on 10.43.0.10:53: read udp 10.42.3.8:55144-&gt;10.43.0.10:53: i/o timeout
2021-06-19T05:55:48.815Z error GitRepository/flux-system.flux-system - unable to send event POST http://notification-controller/ giving up after 5 attempt(s): Post &#34;http://notification-controller/&#34;: context deadline exceeded (Client.Timeout exceeded while awaiting headers)
2021-06-19T05:55:48.815Z error GitRepository/flux-system.flux-system - Reconciler error unable to clone &#39;ssh://git@github.com/jpconstantineau/flux-homelab&#39;, error: dial tcp: lookup github.com on 10.43.0.10:53: read udp 10.42.3.8:45960-&gt;10.43.0.10:53: i/o timeout
2021-06-19T05:57:23.936Z error GitRepository/flux-system.flux-system - unable to send event POST http://notification-controller/ giving up after 5 attempt(s): Post &#34;http://notification-controller/&#34;: context deadline exceeded (Client.Timeout exceeded while awaiting headers)
2021-06-19T05:57:23.936Z error GitRepository/flux-system.flux-system - Reconciler error unable to clone &#39;ssh://git@github.com/jpconstantineau/flux-homelab&#39;, error: dial tcp: lookup github.com on 10.43.0.10:53: read udp 10.42.3.8:40379-&gt;10.43.0.10:53: i/o timeout
2021-06-19T05:58:59.061Z error GitRepository/flux-system.flux-system - unable to send event POST http://notification-controller/ giving up after 5 attempt(s): Post &#34;http://notification-controller/&#34;: context deadline exceeded (Client.Timeout exceeded while awaiting headers)
2021-06-19T05:58:59.062Z error GitRepository/flux-system.flux-system - Reconciler error unable to clone &#39;ssh://git@github.com/jpconstantineau/flux-homelab&#39;, error: dial tcp: lookup github.com on 10.43.0.10:53: read udp 10.42.3.8:42335-&gt;10.43.0.10:53: i/o timeout
stefanprodan commented 3 years ago

Guess now your k3s CNI or CoreDNS is broken… try bootstrap with --network-policy=false, if that doesn’t work then you should investigate why the pods on your cluster can’t reach the dns.

jpconstantineau commented 3 years ago

I just tried re-running bootstrap with the --network-policy=false option and same error messages came back. Looks like I'll need to look into CoreDNS and figure out how to get DNS outside the cluster... (configure the upstream name servers).

Bregor commented 3 years ago

Try to remove old npc manually at first

$ kubectl delete npc -n flux-system --all

On Sat, Jun 19, 2021, 09:24 Pierre Constantineau @.***> wrote:

I just tried re-running bootstrap with the --network-policy=false option and same error messages came back. Looks like I'll need to look into CoreDNS and figure out how to get DNS outside the cluster... (configure the upstream name servers).

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/fluxcd/flux2/issues/1344#issuecomment-864363584, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAAF2LDQWQ5MBPPGUQGLLDTTQZ2TANCNFSM43RJ5PMQ .

Bregor commented 3 years ago

K3s uses flannel as CNI, which does not support npc at all.

On Sat, Jun 19, 2021, 09:10 Stefan Prodan @.***> wrote:

Guess now your k3s CNI or CoreDNS is broken… try bootstrap with --network-policy=false, if that doesn’t work then you should investigate why pods can’t access reach the dns.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/fluxcd/flux2/issues/1344#issuecomment-864362422, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAAF2LTUGNEKJDGQB7JCWDTTQYFBANCNFSM43RJ5PMQ .

jpconstantineau commented 3 years ago

I'll give it a try then see if that works. If so, I'll rebuild the test cluster (there is nothing in it) with the network policy turned off.

I'll report back as I am sure others in a similar situation will benefit...

jpconstantineau commented 3 years ago

Quick update: Latest release resolved the crash on dns lookup failure. Thanks for that!

I then looked at troubleshooting why my cluster was having DNS problems. The instructions at Rancher were helpful in testing whether specific host and cluster setup steps were problematic. Every time I was renaming the hostname from the raspberry pi image for Ubuntu 21, there were DNS problems.

I am not exactly sure what resolved the issue but upgrading to the latest K3S released 3 days ago fixed the issue.

stefanprodan commented 2 years ago

Closing this as it seems resolved upstream in k3s.

nareshmaharaj-consultant commented 2 years ago

Cannot get this to work in kind. Seeing same issues ✔ reconciled sync configuration ◎ waiting for Kustomization "flux-system/flux-system" to be reconciled

✗ client rate limiter Wait returned an error: context deadline exceeded ► confirming components are healthy ✗ helm-controller: deployment not ready ✗ kustomize-controller: deployment not ready ✗ notification-controller: deployment not ready ✗ source-controller: deployment not ready ✗ bootstrap failed with 2 health check failure(s)

flux --version flux version 0.31.0

stefanprodan commented 2 years ago

We use Kubernetes Kind for all our e2e testing, we can't release Flux if those fail. To see why it fails for you, inspect the pods.

nareshmaharaj-consultant commented 2 years ago

Worked after recreating the kind cluster - thanks