fluxcd / flux2

Open and extensible continuous delivery solution for Kubernetes. Powered by GitOps Toolkit.
https://fluxcd.io
Apache License 2.0
6.6k stars 609 forks source link

flux bootstrap command failing #990

Closed donofriov closed 3 years ago

donofriov commented 3 years ago

Describe the bug

After completing all of the pre-requisites and then running the flux bootstrap command, I'm getting ✗ install failed even though the kubernetes resources (deploys, pods, secrets, etc.) are up and running without any errors.

I'm trying to migrate from flux v1 to flux v2. This means that flux v1 is already installed on the cluster in the flux namespace. I've created a new repository for flux v2 and have attempted to bootstrap it with no workloads as not to sync v1 and v2 together. All the v2 repo has is the gotk-components.yaml file from the Add flux v0.8.2 components manifests commit.

To Reproduce

Steps to reproduce the behaviour:

❯ brew install fluxcd/tap/flux
Warning: fluxcd/tap/flux 0.8.2 is already installed and up-to-date.

❯ flux check --pre

► checking prerequisites
✔ kubectl 1.20.2 >=1.18.0-0
✔ Kubernetes 1.18.9-eks-d1db3c >=1.16.0-0
✔ prerequisites checks passed

❯ export GITHUB_TOKEN=<<REDACTED>>

❯ flux bootstrap github \
  --owner=soulcycle \
  --repository=eks-clusters-flux \
  --branch=master \
  --team=ops \
  --path=clusters/eks-useast1-nonprod-02
► connecting to github.com
✔ repository cloned
✚ generating manifests
✔ components manifests pushed
► installing components in flux-system namespace
namespace/flux-system created
customresourcedefinition.apiextensions.k8s.io/alerts.notification.toolkit.fluxcd.io created
customresourcedefinition.apiextensions.k8s.io/buckets.source.toolkit.fluxcd.io created
customresourcedefinition.apiextensions.k8s.io/gitrepositories.source.toolkit.fluxcd.io created
customresourcedefinition.apiextensions.k8s.io/helmcharts.source.toolkit.fluxcd.io created
customresourcedefinition.apiextensions.k8s.io/helmreleases.helm.toolkit.fluxcd.io created
customresourcedefinition.apiextensions.k8s.io/helmrepositories.source.toolkit.fluxcd.io created
customresourcedefinition.apiextensions.k8s.io/kustomizations.kustomize.toolkit.fluxcd.io created
customresourcedefinition.apiextensions.k8s.io/providers.notification.toolkit.fluxcd.io created
customresourcedefinition.apiextensions.k8s.io/receivers.notification.toolkit.fluxcd.io created
serviceaccount/helm-controller created
serviceaccount/kustomize-controller created
serviceaccount/notification-controller created
serviceaccount/source-controller created
clusterrole.rbac.authorization.k8s.io/crd-controller-flux-system created
clusterrolebinding.rbac.authorization.k8s.io/cluster-reconciler-flux-system created
clusterrolebinding.rbac.authorization.k8s.io/crd-controller-flux-system created
service/notification-controller created
service/source-controller created
service/webhook-receiver created
deployment.apps/helm-controller created
deployment.apps/kustomize-controller created
deployment.apps/notification-controller created
deployment.apps/source-controller created
networkpolicy.networking.k8s.io/allow-scraping created
networkpolicy.networking.k8s.io/allow-webhooks created
networkpolicy.networking.k8s.io/deny-ingress created
◎ verifying installation
✗ install failed

Expected behavior

I'd expect the bootstrap to complete without any errors

Additional context

Running the flux check command printed out three different results running the command three different times right after each other:

❯ flux check
► checking prerequisites
✔ kubectl 1.20.2 >=1.18.0-0
✔ Kubernetes 1.18.9-eks-d1db3c >=1.16.0-0
► checking controllers
► ghcr.io/fluxcd/helm-controller:v0.7.0
► ghcr.io/fluxcd/kustomize-controller:v0.8.1
✔ notification-controller: healthy
► ghcr.io/fluxcd/notification-controller:v0.8.0
► ghcr.io/fluxcd/source-controller:v0.8.1

❯ flux check
► checking prerequisites
✔ kubectl 1.20.2 >=1.18.0-0
✔ Kubernetes 1.18.9-eks-d1db3c >=1.16.0-0
► checking controllers
► ghcr.io/fluxcd/helm-controller:v0.7.0
✔ kustomize-controller: healthy
► ghcr.io/fluxcd/kustomize-controller:v0.8.1
► ghcr.io/fluxcd/notification-controller:v0.8.0
✔ source-controller: healthy
► ghcr.io/fluxcd/source-controller:v0.8.1

❯ flux check
► checking prerequisites
✔ kubectl 1.20.2 >=1.18.0-0
✔ Kubernetes 1.18.9-eks-d1db3c >=1.16.0-0
► checking controllers
► ghcr.io/fluxcd/helm-controller:v0.7.0
✔ kustomize-controller: healthy
► ghcr.io/fluxcd/kustomize-controller:v0.8.1
✔ notification-controller: healthy
► ghcr.io/fluxcd/notification-controller:v0.8.0
✔ source-controller: healthy
► ghcr.io/fluxcd/source-controller:v0.8.1

Here is the describe from the helm-controller deploy that never returns a status check:

Name:                   helm-controller
Namespace:              flux-system
CreationTimestamp:      Thu, 25 Feb 2021 11:50:14 +0100
Labels:                 app.kubernetes.io/instance=flux-system
                        app.kubernetes.io/version=v0.8.2
                        control-plane=controller
Annotations:            deployment.kubernetes.io/revision: 1
Selector:               app=helm-controller
Replicas:               1 desired | 1 updated | 1 total | 1 available | 0 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  25% max unavailable, 25% max surge
Pod Template:
  Labels:           app=helm-controller
  Annotations:      prometheus.io/port: 8080
                    prometheus.io/scrape: true
  Service Account:  helm-controller
  Containers:
   manager:
    Image:       ghcr.io/fluxcd/helm-controller:v0.7.0
    Ports:       9440/TCP, 8080/TCP
    Host Ports:  0/TCP, 0/TCP
    Args:
      --events-addr=http://notification-controller/
      --watch-all-namespaces=true
      --log-level=info
      --log-encoding=json
      --enable-leader-election
    Limits:
      cpu:     1
      memory:  1Gi
    Requests:
      cpu:      100m
      memory:   64Mi
    Liveness:   http-get http://:healthz/healthz delay=0s timeout=1s period=10s #success=1 #failure=3
    Readiness:  http-get http://:healthz/readyz delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:
      RUNTIME_NAMESPACE:   (v1:metadata.namespace)
    Mounts:
      /tmp from temp (rw)
  Volumes:
   temp:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Available      True    MinimumReplicasAvailable
  Progressing    True    NewReplicaSetAvailable
OldReplicaSets:  <none>
NewReplicaSet:   helm-controller-669f89cffc (1/1 replicas created)
Events:
  Type    Reason             Age    From                   Message
  ----    ------             ----   ----                   -------
  Normal  ScalingReplicaSet  2m35s  deployment-controller  Scaled up replica set helm-controller-669f89cffc to 1
❯ kubectl version
Client Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.2", GitCommit:"faecb196815e248d3ecfb03c680a4507229c2a56", GitTreeState:"clean", BuildDate:"2021-01-14T05:15:04Z", GoVersion:"go1.15.6", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"18+", GitVersion:"v1.18.9-eks-d1db3c", GitCommit:"d1db3c46e55f95d6a7d3e5578689371318f95ff9", GitTreeState:"clean", BuildDate:"2020-10-20T22:18:07Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}

Below please provide the output of the following commands:

❯ flux --version
flux version 0.8.2

❯ flux check
► checking prerequisites
✔ kubectl 1.20.2 >=1.18.0-0
✔ Kubernetes 1.18.9-eks-d1db3c >=1.16.0-0
► checking controllers
► ghcr.io/fluxcd/helm-controller:v0.7.0
✔ kustomize-controller: healthy
► ghcr.io/fluxcd/kustomize-controller:v0.8.1
✔ notification-controller: healthy
► ghcr.io/fluxcd/notification-controller:v0.8.0
✔ source-controller: healthy
► ghcr.io/fluxcd/source-controller:v0.8.1

❯ kubectl -n flux-system get all
NAME                                           READY   STATUS    RESTARTS   AGE
pod/helm-controller-669f89cffc-spv27           1/1     Running   0          29m
pod/kustomize-controller-895dfd98d-6dt2w       1/1     Running   0          39m
pod/notification-controller-67494b8886-8p88w   1/1     Running   0          39m
pod/source-controller-5658bf8f46-26hxr         1/1     Running   0          39m

NAME                              TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)   AGE
service/notification-controller   ClusterIP   172.20.5.126     <none>        80/TCP    39m
service/source-controller         ClusterIP   172.20.222.206   <none>        80/TCP    39m
service/webhook-receiver          ClusterIP   172.20.131.212   <none>        80/TCP    39m

NAME                                      READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/helm-controller           1/1     1            1           29m
deployment.apps/kustomize-controller      1/1     1            1           39m
deployment.apps/notification-controller   1/1     1            1           39m
deployment.apps/source-controller         1/1     1            1           39m

NAME                                                 DESIRED   CURRENT   READY   AGE
replicaset.apps/helm-controller-669f89cffc           1         1         1       29m
replicaset.apps/kustomize-controller-895dfd98d       1         1         1       39m
replicaset.apps/notification-controller-67494b8886   1         1         1       39m
replicaset.apps/source-controller-5658bf8f46         1         1         1       39m

❯ kubectl -n flux-system logs deploy/source-controller
{"level":"info","ts":"2021-02-25T10:40:49.088Z","logger":"controller-runtime.metrics","msg":"metrics server is starting to listen","addr":":8080"}
{"level":"info","ts":"2021-02-25T10:40:49.090Z","logger":"controller-runtime.injectors-warning","msg":"Injectors are deprecated, and will be removed in v0.10.x"}
{"level":"info","ts":"2021-02-25T10:40:49.090Z","logger":"controller-runtime.injectors-warning","msg":"Injectors are deprecated, and will be removed in v0.10.x"}
{"level":"info","ts":"2021-02-25T10:40:49.090Z","logger":"controller-runtime.injectors-warning","msg":"Injectors are deprecated, and will be removed in v0.10.x"}
{"level":"info","ts":"2021-02-25T10:40:49.091Z","logger":"controller-runtime.injectors-warning","msg":"Injectors are deprecated, and will be removed in v0.10.x"}
{"level":"info","ts":"2021-02-25T10:40:49.091Z","logger":"controller-runtime.injectors-warning","msg":"Injectors are deprecated, and will be removed in v0.10.x"}
{"level":"info","ts":"2021-02-25T10:40:49.091Z","logger":"controller-runtime.injectors-warning","msg":"Injectors are deprecated, and will be removed in v0.10.x"}
{"level":"info","ts":"2021-02-25T10:40:49.091Z","logger":"controller-runtime.injectors-warning","msg":"Injectors are deprecated, and will be removed in v0.10.x"}
{"level":"info","ts":"2021-02-25T10:40:49.091Z","logger":"controller-runtime.injectors-warning","msg":"Injectors are deprecated, and will be removed in v0.10.x"}
{"level":"info","ts":"2021-02-25T10:40:49.091Z","logger":"controller-runtime.injectors-warning","msg":"Injectors are deprecated, and will be removed in v0.10.x"}
{"level":"info","ts":"2021-02-25T10:40:49.091Z","logger":"controller-runtime.injectors-warning","msg":"Injectors are deprecated, and will be removed in v0.10.x"}
{"level":"info","ts":"2021-02-25T10:40:49.091Z","logger":"controller-runtime.injectors-warning","msg":"Injectors are deprecated, and will be removed in v0.10.x"}
{"level":"info","ts":"2021-02-25T10:40:49.091Z","logger":"controller-runtime.injectors-warning","msg":"Injectors are deprecated, and will be removed in v0.10.x"}
{"level":"info","ts":"2021-02-25T10:40:49.091Z","logger":"controller-runtime.injectors-warning","msg":"Injectors are deprecated, and will be removed in v0.10.x"}
{"level":"info","ts":"2021-02-25T10:40:49.091Z","logger":"controller-runtime.injectors-warning","msg":"Injectors are deprecated, and will be removed in v0.10.x"}
{"level":"info","ts":"2021-02-25T10:40:49.091Z","logger":"setup","msg":"starting manager"}
I0225 10:40:49.091424       6 leaderelection.go:243] attempting to acquire leader lease flux-system/305740c0.fluxcd.io...
{"level":"info","ts":"2021-02-25T10:40:49.092Z","msg":"starting metrics server","path":"/metrics"}
I0225 10:40:49.111090       6 leaderelection.go:253] successfully acquired lease flux-system/305740c0.fluxcd.io
{"level":"info","ts":"2021-02-25T10:40:49.111Z","logger":"setup","msg":"starting file server"}
{"level":"info","ts":"2021-02-25T10:40:49.192Z","logger":"controller.bucket","msg":"Starting EventSource","reconciler group":"source.toolkit.fluxcd.io","reconciler kind":"Bucket","source":"kind source: /, Kind="}
{"level":"info","ts":"2021-02-25T10:40:49.192Z","logger":"controller.gitrepository","msg":"Starting EventSource","reconciler group":"source.toolkit.fluxcd.io","reconciler kind":"GitRepository","source":"kind source: /, Kind="}
{"level":"info","ts":"2021-02-25T10:40:49.192Z","logger":"controller.helmchart","msg":"Starting EventSource","reconciler group":"source.toolkit.fluxcd.io","reconciler kind":"HelmChart","source":"kind source: /, Kind="}
{"level":"info","ts":"2021-02-25T10:40:49.192Z","logger":"controller.helmrepository","msg":"Starting EventSource","reconciler group":"source.toolkit.fluxcd.io","reconciler kind":"HelmRepository","source":"kind source: /, Kind="}
{"level":"info","ts":"2021-02-25T10:40:49.194Z","logger":"controller.helmchart","msg":"Starting EventSource","reconciler group":"source.toolkit.fluxcd.io","reconciler kind":"HelmChart","source":"kind source: /, Kind="}
{"level":"info","ts":"2021-02-25T10:40:49.194Z","logger":"controller.helmrepository","msg":"Starting Controller","reconciler group":"source.toolkit.fluxcd.io","reconciler kind":"HelmRepository"}
{"level":"info","ts":"2021-02-25T10:40:49.194Z","logger":"controller.helmchart","msg":"Starting EventSource","reconciler group":"source.toolkit.fluxcd.io","reconciler kind":"HelmChart","source":"kind source: /, Kind="}
{"level":"info","ts":"2021-02-25T10:40:49.294Z","logger":"controller.bucket","msg":"Starting Controller","reconciler group":"source.toolkit.fluxcd.io","reconciler kind":"Bucket"}
{"level":"info","ts":"2021-02-25T10:40:49.294Z","logger":"controller.bucket","msg":"Starting workers","reconciler group":"source.toolkit.fluxcd.io","reconciler kind":"Bucket","worker count":2}
{"level":"info","ts":"2021-02-25T10:40:49.294Z","logger":"controller.gitrepository","msg":"Starting Controller","reconciler group":"source.toolkit.fluxcd.io","reconciler kind":"GitRepository"}
{"level":"info","ts":"2021-02-25T10:40:49.294Z","logger":"controller.gitrepository","msg":"Starting workers","reconciler group":"source.toolkit.fluxcd.io","reconciler kind":"GitRepository","worker count":2}
{"level":"info","ts":"2021-02-25T10:40:49.294Z","logger":"controller.helmrepository","msg":"Starting workers","reconciler group":"source.toolkit.fluxcd.io","reconciler kind":"HelmRepository","worker count":2}
{"level":"info","ts":"2021-02-25T10:40:49.294Z","logger":"controller.helmchart","msg":"Starting EventSource","reconciler group":"source.toolkit.fluxcd.io","reconciler kind":"HelmChart","source":"kind source: /, Kind="}
{"level":"info","ts":"2021-02-25T10:40:49.294Z","logger":"controller.helmchart","msg":"Starting Controller","reconciler group":"source.toolkit.fluxcd.io","reconciler kind":"HelmChart"}
{"level":"info","ts":"2021-02-25T10:40:49.294Z","logger":"controller.helmchart","msg":"Starting workers","reconciler group":"source.toolkit.fluxcd.io","reconciler kind":"HelmChart","worker count":2}

❯ kubectl -n flux-system logs deploy/kustomize-controller
{"level":"info","ts":"2021-02-25T10:40:47.534Z","logger":"controller-runtime.metrics","msg":"metrics server is starting to listen","addr":":8080"}
{"level":"info","ts":"2021-02-25T10:40:47.534Z","logger":"controller-runtime.injectors-warning","msg":"Injectors are deprecated, and will be removed in v0.10.x"}
{"level":"info","ts":"2021-02-25T10:40:47.534Z","logger":"controller-runtime.injectors-warning","msg":"Injectors are deprecated, and will be removed in v0.10.x"}
{"level":"info","ts":"2021-02-25T10:40:47.534Z","logger":"controller-runtime.injectors-warning","msg":"Injectors are deprecated, and will be removed in v0.10.x"}
{"level":"info","ts":"2021-02-25T10:40:47.534Z","logger":"controller-runtime.injectors-warning","msg":"Injectors are deprecated, and will be removed in v0.10.x"}
{"level":"info","ts":"2021-02-25T10:40:47.534Z","logger":"controller-runtime.injectors-warning","msg":"Injectors are deprecated, and will be removed in v0.10.x"}
{"level":"info","ts":"2021-02-25T10:40:47.534Z","logger":"controller-runtime.injectors-warning","msg":"Injectors are deprecated, and will be removed in v0.10.x"}
{"level":"info","ts":"2021-02-25T10:40:47.534Z","logger":"setup","msg":"starting manager"}
{"level":"info","ts":"2021-02-25T10:40:47.534Z","msg":"starting metrics server","path":"/metrics"}
I0225 10:40:47.535625       6 leaderelection.go:243] attempting to acquire leader lease flux-system/7593cc5d.fluxcd.io...
I0225 10:40:47.575119       6 leaderelection.go:253] successfully acquired lease flux-system/7593cc5d.fluxcd.io
{"level":"info","ts":"2021-02-25T10:40:47.634Z","logger":"controller.kustomization","msg":"Starting EventSource","reconciler group":"kustomize.toolkit.fluxcd.io","reconciler kind":"Kustomization","source":"kind source: /, Kind="}
{"level":"info","ts":"2021-02-25T10:40:47.635Z","logger":"controller.kustomization","msg":"Starting EventSource","reconciler group":"kustomize.toolkit.fluxcd.io","reconciler kind":"Kustomization","source":"kind source: /, Kind="}
{"level":"info","ts":"2021-02-25T10:40:47.735Z","logger":"controller.kustomization","msg":"Starting EventSource","reconciler group":"kustomize.toolkit.fluxcd.io","reconciler kind":"Kustomization","source":"kind source: /, Kind="}
{"level":"info","ts":"2021-02-25T10:40:47.835Z","logger":"controller.kustomization","msg":"Starting Controller","reconciler group":"kustomize.toolkit.fluxcd.io","reconciler kind":"Kustomization"}
{"level":"info","ts":"2021-02-25T10:40:47.835Z","logger":"controller.kustomization","msg":"Starting workers","reconciler group":"kustomize.toolkit.fluxcd.io","reconciler kind":"Kustomization","worker count":4}
hiddeco commented 3 years ago

Suspicion that this has to do with the cache (enabled here), which has seen a bug fix upstream awhile ago https://github.com/kubernetes-sigs/cli-utils/pull/279.

Can you please try the binary in the attachment? This has UseCache set to false.

flux-patched.tar.gz

donofriov commented 3 years ago

I'm able to successfully bootstrap with the patched version:

❯ ~/Downloads/bin/flux-darwin-patched check
► checking prerequisites
✔ kubectl 1.20.2 >=1.18.0-0
✔ Kubernetes 1.18.9-eks-d1db3c >=1.16.0-0
► checking controllers
✔ helm-controller: healthy
► ghcr.io/fluxcd/helm-controller:v0.7.0
✔ kustomize-controller: healthy
► ghcr.io/fluxcd/kustomize-controller:v0.8.1
✔ notification-controller: healthy
► ghcr.io/fluxcd/notification-controller:v0.8.0
✔ source-controller: healthy
► ghcr.io/fluxcd/source-controller:v0.8.1
✔ all checks passed

❯ flux bootstrap...
◎ verifying installation
✔ install completed
► configuring deploy key
✔ deploy key configured
► generating sync manifests
✔ sync manifests pushed
► applying sync manifests
◎ waiting for cluster sync
✔ bootstrap finished

❯ kubectl get secrets
NAME                                  TYPE                                  DATA   AGE
default-token-dfzgb                   kubernetes.io/service-account-token   3      61m
flux-system                           Opaque                                3      2m49s
helm-controller-token-6ntwx           kubernetes.io/service-account-token   3      61m
kustomize-controller-token-7hzpr      kubernetes.io/service-account-token   3      61m
notification-controller-token-g5pxh   kubernetes.io/service-account-token   3      61m
source-controller-token-9wpcr         kubernetes.io/service-account-token   3      61m
hiddeco commented 3 years ago

Can you, for the sake of confirming the PR above is indeed the right fix, try it once again with the binary from the attached tarball? If that comes back :green_circle: we are good to go.

Thank you! :sunflower:

flux-patched.tar.gz

donofriov commented 3 years ago

Looks good 🎉

  1. Ran flux uninstall -s, bootstrap with v0.8.2 and install is broken, use release patch binary and bootstrap installs :white_check_mark:
  2. Ran flux uninstall -s , deleted everything from git repo, bootstrap with release patch binary from scratch, install succesful :white_check_mark: