fluxcd / flux

Successor: https://github.com/fluxcd/flux2
https://fluxcd.io
Apache License 2.0
6.9k stars 1.08k forks source link

Fresh flux installation - panic: runtime error: invalid memory address or nil pointer dereference #3195

Closed bpinter closed 4 years ago

bpinter commented 4 years ago

On AWS EKS, I've just tried to install the flux and helm operator. The flux pod starts then it dies immediately with the error: panic: runtime error: invalid memory address or nil pointer dereference

A clear and concise description of what the bug is.

To Reproduce

Steps to reproduce the behaviour:

  1. Provision AWS EKS with Terraform, with node_groups and fargate profile (fargate profile is not yet in use)
  2. Install FluxCD and Helm Operator based on the get started document: https://github.com/fluxcd/helm-operator-get-started helm upgrade -i flux \ --set image.pullSecret=regcred \ --set registry.automationInterval=1m \ --set git.pollInterval=1m \ --set git.url=git@gitlab.com:[repo].git \ --set git.branch=develop \ --set git.path="fluxcd/helm\,fluxcd/namespaces\,fluxcd/releases/dev" \ --set git.label=[project]-dev \ --set prometheus.enabled=true \ --namespace fluxcd \ fluxcd/flux

Expected behaviour

Flux and HelmOperator are running and listening on GitLab changes

Logs

`bpint@mac-01-52:~/Developer/bs/k8s-cluster$ kc logs -f flux-5f78d75468-jbt94 Flag --git-verify-signatures has been deprecated, changed to --git-verify-signatures-mode, use that instead ts=2020-07-16T10:05:33.818586156Z caller=main.go:259 version=1.20.0 ts=2020-07-16T10:05:33.818634392Z caller=main.go:412 msg="using kube config: \"/root/.kube/config\" to connect to the cluster" ts=2020-07-16T10:05:33.843227855Z caller=main.go:492 component=cluster identity=/etc/fluxd/ssh/identity ts=2020-07-16T10:05:33.843282699Z caller=main.go:493 component=cluster identity.pub="ssh-rsa [ssh key - didn't want to share, so, removed from the log]" ts=2020-07-16T10:05:33.843314734Z caller=main.go:498 host=https://10.100.0.1:443 version=kubernetes-v1.16.8-eks-fd1ea7 ts=2020-07-16T10:05:33.843369625Z caller=main.go:510 kubectl=/usr/local/bin/kubectl ts=2020-07-16T10:05:33.844074742Z caller=main.go:527 ping=true ts=2020-07-16T10:05:33.847280738Z caller=main.go:666 url=ssh://git@gitlab.com/bimspot/k8s-cluster.git user="Weave Flux" email=support@weave.works signing-key= verify-signatures-mode=none sync-tag=bimspot-dev state=git readonly=false registry-disable-scanning=false notes-ref=bimspot-dev set-author=false git-secret=false sops=false ts=2020-07-16T10:05:33.84783731Z caller=main.go:772 upstream="no upstream URL given" ts=2020-07-16T10:05:33.848180111Z caller=images.go:17 component=sync-loop msg="polling for new images for automated workloads" ts=2020-07-16T10:05:33.848217612Z caller=images.go:27 component=sync-loop msg="no automated workloads" ts=2020-07-16T10:05:33.848422907Z caller=loop.go:108 component=sync-loop err="loading last-synced resources: git repo not ready: git repo has not been cloned yet" ts=2020-07-16T10:05:33.849280735Z caller=main.go:795 addr=:3030 ts=2020-07-16T10:05:34.330334888Z caller=checkpoint.go:24 component=checkpoint msg="up to date" latest=1.20.0 ts=2020-07-16T10:05:46.507824549Z caller=loop.go:134 component=sync-loop event=refreshed url=ssh://git@gitlab.com/bimspot/k8s-cluster.git branch=develop HEAD=12d643d4debaeabefab20f956e4a66232ce76b94 ts=2020-07-16T10:05:46.514036608Z caller=sync.go:60 component=daemon info="trying to sync git changes to the cluster" old= new=12d643d4debaeabefab20f956e4a66232ce76b94 panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x176c94f]

goroutine 82 [running]: github.com/fluxcd/flux/pkg/cluster/kubernetes/resource.Load.func1(0xc0000b8750, 0x26, 0x0, 0x0, 0x20f5120, 0xc0006ef0b0, 0x0, 0xc0006e6870) /home/circleci/go/src/github.com/fluxcd/flux/pkg/cluster/kubernetes/resource/load.go:32 +0x6f path/filepath.Walk(0xc0000b8750, 0x26, 0xc00072f618, 0x0, 0x0) /usr/local/go/src/path/filepath/path.go:402 +0x6a github.com/fluxcd/flux/pkg/cluster/kubernetes/resource.Load(0xc001d542c0, 0x1a, 0xc0006e6780, 0x3, 0x3, 0xc001d54500, 0x20, 0x20, 0x40c1d6) /home/circleci/go/src/github.com/fluxcd/flux/pkg/cluster/kubernetes/resource/load.go:31 +0x19b github.com/fluxcd/flux/pkg/cluster/kubernetes.(manifests).LoadManifests(0xc000648930, 0xc001d542c0, 0x1a, 0xc0006e6780, 0x3, 0x3, 0xc0000b8840, 0x0, 0x20) /home/circleci/go/src/github.com/fluxcd/flux/pkg/cluster/kubernetes/manifests.go:122 +0x67 github.com/fluxcd/flux/pkg/manifests.(rawFiles).GetAllResourcesByID(0xc00197e540, 0x213dda0, 0xc0000bc058, 0x3990981b40c935b4, 0x39d263b326feea65, 0xc00072f760) /home/circleci/go/src/github.com/fluxcd/flux/pkg/manifests/rawfiles.go:96 +0x60 github.com/fluxcd/flux/pkg/daemon.doSync(0x213dda0, 0xc0000bc058, 0x21257a0, 0xc00197e540, 0x2158440, 0xc000190400, 0xc0000b8840, 0x2b, 0x20f2f20, 0xc00009fc80, ...) /home/circleci/go/src/github.com/fluxcd/flux/pkg/daemon/sync.go:221 +0x66 github.com/fluxcd/flux/pkg/daemon.(Daemon).Sync(0xc000361e60, 0x213dda0, 0xc0000bc058, 0x1e45a32e, 0xed6a21d7a, 0x0, 0xc000500b70, 0x28, 0x21256a0, 0xc0002cc4c0, ...) /home/circleci/go/src/github.com/fluxcd/flux/pkg/daemon/sync.go:71 +0x404 github.com/fluxcd/flux/pkg/daemon.(Daemon).Loop(0xc000361e60, 0xc00009a2a0, 0xc00034ca00, 0x20f2f20, 0xc00009fec0) /home/circleci/go/src/github.com/fluxcd/flux/pkg/daemon/loop.go:103 +0x525 created by main.main /home/circleci/go/src/github.com/fluxcd/flux/cmd/fluxd/main.go:777 +0x5990`

Additional context

squaremo commented 4 years ago

If you're able to, can you try the image fluxcd/flux-prerelease:master-64092ddd, which (I allege) has a fix for this. Hence or otherwise, I believe this is caused by supplying a --git-path argument that doesn't correspond to a directory in the repo, so until there's a released fix, you could look into that.

farant commented 4 years ago

Fixing a typo in a --git-path directory name resolved this issue for me.

dataplex commented 4 years ago

This appears to still be an issue even with that prerelease.

└─[$] kubectl describe -n flux deploy flux                                                                                          [0:22:42]
Name:                   flux
Namespace:              flux
CreationTimestamp:      Sat, 18 Jul 2020 22:15:37 -0500
Labels:                 app=flux
                        app.kubernetes.io/managed-by=Helm
                        chart=flux-1.4.0
                        heritage=Helm
                        release=flux
Annotations:            deployment.kubernetes.io/revision: 6
                        meta.helm.sh/release-name: flux
                        meta.helm.sh/release-namespace: flux
Selector:               app=flux,release=flux
Replicas:               1 desired | 1 updated | 1 total | 1 available | 0 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  25% max unavailable, 25% max surge
Pod Template:
  Labels:           app=flux
                    release=flux
  Service Account:  flux
  Containers:
   flux:
    Image:      docker.io/fluxcd/flux-prerelease:master-64092ddd
    Port:       3030/TCP
    Host Port:  0/TCP
    Args:
      --log-format=fmt
      --ssh-keygen-dir=/var/fluxd/keygen
      --ssh-keygen-format=RFC4716
      --k8s-secret-name=flux-git-deploy
      --memcached-hostname=flux-memcached
      --sync-state=secret
      --memcached-service=
      --git-url=git@github.com:dataplex/gitops-istio
      --git-branch=master
      --git-path=flux_root
      --git-readonly=false
      --git-user=Weave Flux
      --git-email=support@weave.works
      --git-verify-signatures=false
      --git-set-author=false
      --git-poll-interval=1m
      --git-timeout=20s
      --sync-interval=1m
      --git-ci-skip=false
      --automation-interval=1m
      --registry-rps=200
      --registry-burst=125
      --registry-trace=false
      --sync-garbage-collection=true
    Requests:
      cpu:      50m
      memory:   64Mi
    Liveness:   http-get http://:3030/api/flux/v6/identity.pub delay=5s timeout=5s period=10s #success=1 #failure=3
    Readiness:  http-get http://:3030/api/flux/v6/identity.pub delay=5s timeout=5s period=10s #success=1 #failure=3
    Environment:
      KUBECONFIG:  /root/.kubectl/config
    Mounts:
      /etc/fluxd/ssh from git-key (ro)
      /root/.kubectl from kubedir (rw)
      /var/fluxd/keygen from git-keygen (rw)
  Volumes:
   kubedir:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      flux-kube-config
    Optional:  false
   git-key:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  flux-git-deploy
    Optional:    false
   git-keygen:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     Memory
    SizeLimit:  <unset>
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Available      True    MinimumReplicasAvailable
  Progressing    True    NewReplicaSetAvailable
OldReplicaSets:  flux-7dc59879dd (1/1 replicas created)
NewReplicaSet:   <none>
Events:
  Type    Reason             Age                From                   Message
  ----    ------             ----               ----                   -------
  Normal  ScalingReplicaSet  47m                deployment-controller  Scaled down replica set flux-6866668986 to 0
  Normal  ScalingReplicaSet  39m (x2 over 48m)  deployment-controller  Scaled up replica set flux-9cd77b5b to 1
  Normal  ScalingReplicaSet  35m                deployment-controller  Scaled up replica set flux-bf6895777 to 1
  Normal  ScalingReplicaSet  35m (x2 over 40m)  deployment-controller  Scaled down replica set flux-9cd77b5b to 0
  Normal  ScalingReplicaSet  23m                deployment-controller  Scaled up replica set flux-6f46d695d7 to 1
  Normal  ScalingReplicaSet  11m                deployment-controller  Scaled down replica set flux-bf6895777 to 0
  Normal  ScalingReplicaSet  9m37s              deployment-controller  Scaled up replica set flux-7dc59879dd to 1
  Normal  ScalingReplicaSet  9m21s              deployment-controller  Scaled down replica set flux-6f46d695d7 to 0
Flag --git-verify-signatures has been deprecated, changed to --git-verify-signatures-mode, use that instead
ts=2020-07-19T05:13:28.366952151Z caller=main.go:259 version=master-64092ddd
ts=2020-07-19T05:13:28.366995671Z caller=main.go:412 msg="using kube config: \"/root/.kube/config\" to connect to the cluster"
ts=2020-07-19T05:13:28.384131745Z caller=main.go:492 component=cluster identity=/etc/fluxd/ssh/identity
ts=2020-07-19T05:13:28.384168868Z caller=main.go:493 component=cluster identity.pub="ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQCoBxvgesyv49+sBXGTpBvWzNDr9+jJMNLnI224knBJPYHZTgqjfSEKk2BrFlHkD7PqppXYYE4+Ei9G4EwPNfUFbCX2KFSWWHn8KFvp/utV1YwlCFXTQkKkzJBH33UaJVIrfJNcaZS+z0NxxECCJmUEEQiPqsZjuOINwSEg3Q5CpW+cHrFYzBl251U4PqO2y7Dly8mH4LqqBYgEGyYBCTVOaJuUAWR8Ru1lRDsro12ZoznjRR9IGBzycrOcqBz4BXb3th2jwtmu/x+hDQ1lABbwPD+fhDD5S5Ls2SKFNH8JV4p2OJALvC90pLVOyDTMbYQysKtxKp4aMndaXIvqWKjDQB4t6PTwgXMpCjrlQmzGi2ajliuy0P3u+FY3ihL2yYiBKNzuq6TP/C9dO/dNfo5d0lPPIMeYOoi/qGPtjhTzDDaVacSPyp3f8wR06KA96gI+B34kadCbB/GETniqP++ybdtU6qrOEX68rIkKCxoXJNDEK1KfVxGJeiTqAHYyIwE= root@flux-6675c954d4-zdwst"
ts=2020-07-19T05:13:28.384206014Z caller=main.go:498 host=https://10.100.0.1:443 version=kubernetes-v1.16.8-eks-fd1ea7
ts=2020-07-19T05:13:28.384265748Z caller=main.go:510 kubectl=/usr/local/bin/kubectl
ts=2020-07-19T05:13:28.384968602Z caller=main.go:527 ping=true
ts=2020-07-19T05:13:28.385846213Z caller=main.go:666 url=ssh://git@github.com/dataplex/gitops-istio user="Weave Flux" email=support@weave.works signing-key= verify-signatures-mode=none sync-tag=flux-sync state=secret readonly=false registry-disable-scanning=false notes-ref=flux set-author=false git-secret=false sops=false
ts=2020-07-19T05:13:28.390140296Z caller=main.go:772 upstream="no upstream URL given"
ts=2020-07-19T05:13:28.3906428Z caller=main.go:795 addr=:3030
ts=2020-07-19T05:13:28.393671662Z caller=loop.go:108 component=sync-loop err="loading last-synced resources: reading the repository checkout: cloning repo: git repo not ready: git repo has not been cloned yet"
ts=2020-07-19T05:13:28.393713822Z caller=images.go:17 component=sync-loop msg="polling for new images for automated workloads"
ts=2020-07-19T05:13:28.393724148Z caller=images.go:27 component=sync-loop msg="no automated workloads"
ts=2020-07-19T05:13:28.463752003Z caller=aws.go:151 component=aws info="detected cluster region" source="EC2 metadata service" region=us-east-2
ts=2020-07-19T05:13:28.463788108Z caller=aws.go:117 component=aws info="restricting ECR registry scans" regions=[us-east-2] include-ids=[] exclude-ids="[602401143452 918309763551]"
ts=2020-07-19T05:13:28.713510395Z caller=checkpoint.go:24 component=checkpoint msg="up to date" latest=1.20.0
ts=2020-07-19T05:13:30.560451503Z caller=warming.go:198 component=warmer info="refreshing image" image=docker.io/fluxcd/flux-prerelease tag_count=415 to_update=415 of_which_refresh=0 of_which_missing=415
ts=2020-07-19T05:13:35.61563336Z caller=loop.go:134 component=sync-loop event=refreshed url=ssh://git@github.com/dataplex/gitops-istio branch=master HEAD=4bed6299a21fc9f474343758f2c8bcf929813103
ts=2020-07-19T05:13:35.658483698Z caller=loop.go:108 component=sync-loop err="loading last-synced resources: loading resources from repo: unable to read root path \"/tmp/flux-working722538269/flux_root\": stat /tmp/flux-working722538269/flux_root: no such file or directory"
ts=2020-07-19T05:13:36.299077816Z caller=warming.go:206 component=warmer updated=docker.io/fluxcd/flux-prerelease successful=415 attempted=415
ts=2020-07-19T05:13:36.299189942Z caller=images.go:17 component=sync-loop msg="polling for new images for automated workloads"
ts=2020-07-19T05:13:36.575650737Z caller=images.go:27 component=sync-loop msg="no automated workloads"
ts=2020-07-19T05:14:36.048166347Z caller=loop.go:108 component=sync-loop err="loading last-synced resources: loading resources from repo: unable to read root path \"/tmp/flux-working227094871/flux_root\": stat /tmp/flux-working227094871/flux_root: no such file or directory"
ts=2020-07-19T05:14:36.049232335Z caller=loop.go:134 component=sync-loop event=refreshed url=ssh://git@github.com/dataplex/gitops-istio branch=master HEAD=4bed6299a21fc9f474343758f2c8bcf929813103
ts=2020-07-19T05:14:36.575809494Z caller=images.go:17 component=sync-loop msg="polling for new images for automated workloads"
ts=2020-07-19T05:14:36.825617767Z caller=images.go:27 component=sync-loop msg="no automated workloads"

last commit to my repo I moved files from the root directory into a subdirectory because fluxcd was scanning other yaml files I have in there.

commit 4bed6299a21fc9f474343758f2c8bcf929813103 (HEAD -> master, origin/master, origin/HEAD)
Author: Benjamin Floyd <benjamin.floyd@cyberark.com>
Date:   Sat Jul 18 23:32:15 2020 -0500

    Big change coming with this...moving flux monitoring to a flux_root directory

diff --git a/cert-manager/cert-manager.crds.yaml b/flux_root/cert-manager/cert-manager.crds.yaml
similarity index 100%
rename from cert-manager/cert-manager.crds.yaml
rename to flux_root/cert-manager/cert-manager.crds.yaml
diff --git a/flagger/flagger-crds.yaml b/flux_root/flagger/flagger-crds.yaml
similarity index 100%
rename from flagger/flagger-crds.yaml
rename to flux_root/flagger/flagger-crds.yaml
diff --git a/flagger/flagger-grafana.yaml b/flux_root/flagger/flagger-grafana.yaml
similarity index 100%
rename from flagger/flagger-grafana.yaml
rename to flux_root/flagger/flagger-grafana.yaml
diff --git a/flagger/flagger.yaml b/flux_root/flagger/flagger.yaml
similarity index 100%
camorra-skk commented 4 years ago

I am also getting the same issue on a rancher RKE cluster. My directory structure for the kubernetes manifest files are [repo]/gitops/namespaces,[repo]/gitops/releases/dev.

helm upgrade -i flux fluxcd/flux --wait --namespace fluxcd --set git.url=git@github.com:$github_username/$github_repo --set git.branch=dev-gitops --set git.path="/gitops/namespaces,/gitops/releases/dev"

After running the above command I am getting the same error as above.

panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x176c94f]

goroutine 110 [running]: github.com/fluxcd/flux/pkg/cluster/kubernetes/resource.Load.func1(0xc0003da7b0, 0x2c, 0x0, 0x0, 0x20f5120, 0xc001907a70, 0x0, 0xc001906150) /home/circleci/go/src/github.com/fluxcd/flux/pkg/cluster/kubernetes/resource/load.go:32 +0x6f path/filepath.Walk(0xc0003da7b0, 0x2c, 0xc0003eb6b0, 0x0, 0x0) /usr/local/go/src/path/filepath/path.go:402 +0x6a github.com/fluxcd/flux/pkg/cluster/kubernetes/resource.Load(0xc003a5c860, 0x1a, 0xc00010c5c0, 0x1, 0x1, 0xc00010c500, 0xc000275230, 0xc003a5c860, 0xc0003eb790) /home/circleci/go/src/github.com/fluxcd/flux/pkg/cluster/kubernetes/resource/load.go:31 +0x19b github.com/fluxcd/flux/pkg/cluster/kubernetes.(manifests).LoadManifests(0xc000275230, 0xc003a5c860, 0x1a, 0xc00010c5c0, 0x1, 0x1, 0x42e0aa, 0x0, 0xc001880120) /home/circleci/go/src/github.com/fluxcd/flux/pkg/cluster/kubernetes/manifests.go:122 +0x67 github.com/fluxcd/flux/pkg/manifests.(rawFiles).GetAllResourcesByID(0xc0018ce1c0, 0x213dda0, 0xc000042080, 0xc001a28990, 0x28, 0x21257a0) /home/circleci/go/src/github.com/fluxcd/flux/pkg/manifests/rawfiles.go:96 +0x60 github.com/fluxcd/flux/pkg/daemon.(Daemon).getLastResources(0xc0002458c0, 0x213dda0, 0xc000042080, 0x21256a0, 0xc000044080, 0x0, 0x0, 0x0) /home/circleci/go/src/github.com/fluxcd/flux/pkg/daemon/sync.go:140 +0x176 github.com/fluxcd/flux/pkg/daemon.(Daemon).Sync(0xc0002458c0, 0x213dda0, 0xc000042080, 0x34604f6b, 0xed6a77432, 0x0, 0xc001a28810, 0x28, 0x21256a0, 0xc000044080, ...) /home/circleci/go/src/github.com/fluxcd/flux/pkg/daemon/sync.go:49 +0x82 github.com/fluxcd/flux/pkg/daemon.(*Daemon).Loop(0xc0002458c0, 0xc000092240, 0xc0004e7370, 0x20f2f20, 0xc0004cd470) /home/circleci/go/src/github.com/fluxcd/flux/pkg/daemon/loop.go:103 +0x525 created by main.main /home/circleci/go/src/github.com/fluxcd/flux/cmd/fluxd/main.go:777 +0x5990

matthewbrahms commented 4 years ago

I am getting the same error. After an hour of looking it at it, if I have more than one item in --git-path=, then it throws this error. Having only 1 item here lets it work. I'm on EKS and GKE as well. All errors are the same as previously mentioned. Using 1.20.0 if that helps.

I have a main repo called flux and inside of that I have a folder called clusters. Inside of clusters is a folder called global (which I want to place things that should be installed in every cluster) and then folders by each cluster name where workloads/application yaml's will go.

flux
|__clusters
    |__global
    |__gke-cluster-1
    |__eks-cluster-1
    |__etc...

I was expecting the line - --git-path=clusters/global,clusters/eks-cluster-1 to work. Previously I was able to use two separate --git-path= lines with each folder separately, but that no longer works either.

Any questions, happy to provide more details for replication.

gautamr commented 4 years ago

I already tried with 1.20.0 and having the same issue

gautamr commented 4 years ago

I was getting error even for a single path in GKE with 1.20.0 and setting no path it seems working, feeling bad because we are trying to go to production with this version

jimsgreen commented 4 years ago

Been struggling with this.. Noticed this issue doesn't happen with 1.19.0

marratj commented 4 years ago

We noticed the same issue.

Supplying a --git-path leads fluxd to crash each time it tries to sync in 1.20.0

In 1.19.0 it works as expected.

squaremo commented 4 years ago

I can reproduce the panic in the original report, with flux 1.20.0, by supplying a path that doesn't exist in the repository.

However, the latter is not ideal either, since it will stop a sync proceeding. This is detailed in bug #3184 which is separate but interacts with this one: on startup, flux will try to construct the state of the last sync by looking at the repository at the high water mark (sync tag); if a path is missing, it'll either panic (due to this bug) or log an error and fail to sync (in the pre-release build).

@dataplex I think you're seeing #3184 -- not a panic, but no syncing either -- does that seem right to you?

@camorra-skk Try using paths without the leading slash, e.g., gitops/namespaces -- you may just be hitting the "missing path -> panic" problem of the original report

@matthewbrahms I think you're getting the combo-bug, with the new path not being in the old revision, triggering a panic.

@gautamr @marratj Possibly you're seeing one of the above situations too. Same recommendation goes: try the pre-release, and check that the git-paths exist in the repo.

Thanks for the reports everyone. I think the fix will rest largely with mitigating #3184 so I'm going to concentrate there to start with.

squaremo commented 4 years ago

The problem of a missing path causing a panic is fixed in #3193. See also #3223 for a related fix.