crossplane-contrib / provider-gcp

Crossplane GCP provider
Apache License 2.0
125 stars 100 forks source link

Importing GKE cluster with a maintenance policy enabled causes provider-gcp pod to crash #480

Open ghost opened 1 year ago

ghost commented 1 year ago

What happened?

When we tried to "import" an existing GKE cluster that has a maintenance policy enabled (ours was enabled for weekly on Sat & Sun) the cluster never syncs and it causes the provider-gcp pod to crash with errors. Note: By the way I verified my gcp provider is configured correctly by creating a new cluster via crossplane.

Below is output from kubectl logs of provider-gcp pod:

Getting pod logs for provider-gcp-897a5469eb94-564586475-lzkhg
I1025 14:30:58.246524       1 request.go:665] Waited for 1.033749861s due to client-side throttling, not priority and fairness, request: GET:https://10.96.0.1:443/apis/node.k8s.io/v1?timeout=32s
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x19134b8]

goroutine 833 [running]:
github.com/crossplane/provider-gcp/pkg/clients/cluster.LateInitializeSpec(_, {0xc00023c070, 0xc000dd70e0, 0x0, 0xc00023c150, 0x0, {0xc0003035f0, 0xc}, {0x0, 0x0, ...}, ...})
        /home/runner/work/provider-gcp/provider-gcp/pkg/clients/cluster/cluster.go:830 +0x2678
github.com/crossplane/provider-gcp/pkg/controller/container.(*clusterExternal).Observe(0xc0009a1830, {0x232aec0, 0xc0008ce9c0}, {0x2399950, 0xc00005ac00})
        /home/runner/work/provider-gcp/provider-gcp/pkg/controller/container/cluster.go:117 +0x56a
github.com/crossplane/crossplane-runtime/pkg/reconciler/managed.(*Reconciler).Reconcile(0xc00025cf20, {0x232aef8, 0xc000669b60}, {{{0x0, 0x0}, {0xc00093ed38, 0x14}}})
        /home/runner/work/provider-gcp/provider-gcp/vendor/github.com/crossplane/crossplane-runtime/pkg/reconciler/managed/reconciler.go:767 +0x230b
github.com/crossplane/crossplane-runtime/pkg/ratelimiter.(*Reconciler).Reconcile(0xc000354690, {0x232aef8, 0xc000669b60}, {{{0x0, 0x1f3eaa0}, {0xc00093ed38, 0x30}}})
        /home/runner/work/provider-gcp/provider-gcp/vendor/github.com/crossplane/crossplane-runtime/pkg/ratelimiter/reconciler.go:54 +0x16b
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0xc00025cfd0, {0x232aef8, 0xc000669b30}, {{{0x0, 0x1f3eaa0}, {0xc00093ed38, 0x413a94}}})
        /home/runner/work/provider-gcp/provider-gcp/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:114 +0x26f
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc00025cfd0, {0x232ae50, 0xc00078a580}, {0x1d7e1a0, 0xc0009ffc60})
        /home/runner/work/provider-gcp/provider-gcp/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:311 +0x33e
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc00025cfd0, {0x232ae50, 0xc00078a580})
        /home/runner/work/provider-gcp/provider-gcp/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266 +0x205
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2()
        /home/runner/work/provider-gcp/provider-gcp/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227 +0x85
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2
        /home/runner/work/provider-gcp/provider-gcp/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:223 +0x357

How can we reproduce it?

  1. Create a basic GKE cluster (i.e. gcloud container clusters create testcluster --region=us-west1)
  2. Modify the testcluster just created to give it a maintenance policy window. I did weekly on Saturday & Sunday and via the google console UI.
  3. Use below yaml to attempt to import testcluster into crossplane and see that the provider-gcp pod crashes immediately.
    apiVersion: container.gcp.crossplane.io/v1beta2
    kind: Cluster
    metadata:
    name: testcluster
    annotations:
    crossplane.io/external-name: testcluster
    spec:
    deletionPolicy: Orphan
    forProvider: {
    location: us-west1
    }

What environment did it happen in?

wofr commented 7 months ago

Run into the same issue! Did you find a workarround or did you switchto a new gcp-provider?