SovereignCloudStack / cluster-stack-operator

The SCS Cluster Stack Operator takes care of life cycle management, configuration and provider specific tasks of Kubernetes clusters created with SCS Cluster Stacks
https://scs.community/
Apache License 2.0
13 stars 3 forks source link

Crash with ClusterStack Resource with empty `providerRef` and `noProvider` #156

Open mxmxchere opened 5 months ago

mxmxchere commented 5 months ago

The CSO crashes with the following ClusterStack spec

apiVersion: clusterstack.x-k8s.io/v1alpha1                                                                             
kind: ClusterStack
metadata:
  name: scs-cluster-stack
  namespace: default
spec:
  provider: openstack
  name: scs
  kubernetesVersion: "1.27"
  channel: stable
  autoSubscribe: false
  versions:
    - v4

I think the missing providerRef is actually detected and the crash happens when printing the Error message, however i have not investigated deeply. The crashlog is:

{"level":"INFO","time":"2024-05-22T09:15:41.002Z","file":"controller/controller.go:115","message":"Observed a panic in reconciler: runtime error: invalid memory address or nil pointer dereference","controller":"clusterstack","controllerGroup":"clusterstack.x-k8s.io","controllerKind":"ClusterStack","ClusterStack":{"name":"scs-cluster-stack","namespace":"default"},"namespace":"default","name":"scs-cluster-stack","reconcileID":"66ffaad8-291e-43a7-aa54-d5ba5f5ce8f3"}
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
    panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x19082d4]

goroutine 204 [running]:
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1()
    /src/cluster-stack-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:116 +0x1e5
panic({0x1b76920?, 0x313ed60?})
    /usr/local/go/src/runtime/panic.go:914 +0x21f
github.com/SovereignCloudStack/cluster-stack-operator/internal/controller.(*ClusterStackReconciler).createOrUpdateProviderClusterStackRelease(0xc00028fc00, {0x213be10, 0xc0006915c0}, {0xc0005a29f0, 0x15}, 0xc0003d2380)
    /src/cluster-stack-operator/internal/controller/clusterstack_controller.go:275 +0xf4
github.com/SovereignCloudStack/cluster-stack-operator/internal/controller.(*ClusterStackReconciler).Reconcile(0xc00028fc00, {0x213be10?, 0xc0006915c0}, {{{0xc0002e0796?, 0x5?}, {0xc0002440d8?, 0xc0004a1d48?}}})
    /src/cluster-stack-operator/internal/controller/clusterstack_controller.go:178 +0x1073
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0x213f928?, {0x213be10?, 0xc0006915c0?}, {{{0xc0002e0796?, 0xb?}, {0xc0002440d8?, 0x0?}}})
    /src/cluster-stack-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:119 +0xb7
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc0004b4c80, {0x213be48, 0xc0003a6f00}, {0x1c5ec60?, 0xc0002230c0?})
    /src/cluster-stack-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:316 +0x3cc
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc0004b4c80, {0x213be48, 0xc0003a6f00})
    /src/cluster-stack-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266 +0x1c9
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2()
    /src/cluster-stack-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227 +0x79
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2 in goroutine 92
    /src/cluster-stack-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:223 +0x565
janiskemper commented 5 months ago

that shouldn't be the case. Thanks for reporting!

Just to make sure: if "noProvider" is on "false" then you have to specify a providerRef, right?

I'm asking not because it is relevant for this bug report, but to avoid misunderstandings regarding the usage

mxmxchere commented 5 months ago

Just to make sure: if "noProvider" is on "false" then you have to specify a providerRef, right?

It depends on what you mean with "have to". It is not enforced by some webhook or another mechanism when applying the example resource. But if you mean we both agree that "one is supposed to specify a providerRef when noProvider: false" in terms of "that is how the resource is designed" then we have the same understanding.

But independent of noProvider: false or an ommited noProvider field, when the providerRef is missing it crashes in both cases. So the below example also triggers a crash:

apiVersion: clusterstack.x-k8s.io/v1alpha1
kind: ClusterStack
metadata:
  name: scs-cluster-stack
  namespace: default
spec:
  provider: openstack
  noProvider: false
  name: scs
  kubernetesVersion: "1.27"
  channel: stable
  autoSubscribe: false
  versions:
    - v4
janiskemper commented 5 months ago

yes I understand that this is a bug. It apparently happens when the providerRef is misconfigured (you want a provider but you don't specify one).

We are going to add webhooks, then this will be caught by them as well!

jschoone commented 5 months ago

Related to https://github.com/SovereignCloudStack/cluster-stack-operator/issues/13