emqx / emqx-operator

A Kubernetes Operator for EMQX
https://www.emqx.com
Apache License 2.0
203 stars 64 forks source link

Observed a panic in reconciler: runtime error: invalid memory address or nil pointer dereference #1008

Closed ypereirareis closed 5 months ago

ypereirareis commented 5 months ago

Hi !

Describe the bug After operator upgrade, the controller is failing, without any other changes.

{"level":"info","ts":"2024-01-24T07:58:35Z","msg":"Starting workers","controller":"emqxplugin","controllerGroup":"apps.emqx.io","controllerKind":"EmqxPlugin","worker count":1}
{"level":"info","ts":"2024-01-24T07:58:35Z","msg":"Starting workers","controller":"rebalance","controllerGroup":"apps.emqx.io","controllerKind":"Rebalance","worker count":1}
{"level":"info","ts":"2024-01-24T07:58:35Z","msg":"Starting workers","controller":"emqxenterprise","controllerGroup":"apps.emqx.io","controllerKind":"EmqxEnterprise","worker count":1}
{"level":"info","ts":"2024-01-24T07:58:36Z","msg":"Observed a panic in reconciler: runtime error: invalid memory address or nil pointer dereference","controller":"emqx","controllerGroup":"apps.emqx.io","controllerKind":"EMQX","eMQX":{"name":"emqx-retail-api","namespace":"emqx"},"namespace":"emqx","name":"emqx-retail-api","reconcileID":"eb20c003-f22a-4acb-979f-82700cba4f6f"}
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
    panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x60 pc=0x164c3c3]

goroutine 642 [running]:
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1()
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.3/pkg/internal/controller/controller.go:118 +0x1f4
panic({0x180cc00, 0x28c5bb0})
    /usr/local/go/src/runtime/panic.go:884 +0x213
github.com/emqx/emqx-operator/controllers/apps/v2beta1.(*updateStatus).reconcile(0xc0004ba040, {0x1c8fc30, 0xc0008c8a20}, 0xc0002d0000, {0x1c90960, 0xc0036d6780})
    /workspace/controllers/apps/v2beta1/update_emqx_status.go:119 +0x5c3
github.com/emqx/emqx-operator/controllers/apps/v2beta1.(*EMQXReconciler).Reconcile(0xc000845800, {0x1c8fc30, 0xc0008c8a20}, {{{0xc0001475b0?, 0x0?}, {0xc0008ea978?, 0x40dec7?}}})
    /workspace/controllers/apps/v2beta1/emqx_controller.go:133 +0x6a3
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0x1c8fb88?, {0x1c8fc30?, 0xc0008c8a20?}, {{{0xc0001475b0?, 0x1961820?}, {0xc0008ea978?, 0xc0005be2d8?}}})
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.3/pkg/internal/controller/controller.go:121 +0xc8
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc000376a00, {0x1c8fb88, 0xc000114d70}, {0x187f440?, 0xc000b30ac0?})
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.3/pkg/internal/controller/controller.go:320 +0x309
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc000376a00, {0x1c8fb88, 0xc000114d70})
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.3/pkg/internal/controller/controller.go:273 +0x1d9
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2()
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.3/pkg/internal/controller/controller.go:234 +0x85
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.3/pkg/internal/controller/controller.go:230 +0x587

To Reproduce

Expected behavior A successful upgrade :)

Anything else we need to know?:

Environment details::

Thanks a lot !

Rory-Z commented 5 months ago

Hi @ypereirareis Could you please try EMQX operator 2.2.14 ? And I think you should not both upgrade EMQX and EMQX operator at the same time, maybe you can upgrade EMQX first, and after it stable, and upgrade EMQX operator.

And if still not work, please show your EMQX's YAML file

ypereirareis commented 5 months ago

Hello @Rory-Z !

We finally found the problem on our side leading to this invalid memory address or nil pointer dereference error. During upgrade from version 5.0.9 to version 5.3.2 of emqx, we missed the upgrade of apiVersion:

EMQX Open Source Version EMQX Operator Version APIVersion Kind
5.1.1 or higher 2.2.0 apps.emqx.io/v2beta1 EMQX
- apiVersion: apps.emqx.io/v2alpha1
+ apiVersion: apps.emqx.io/v2beta1
kind: EMQX
metadata:
  name: [NAME]
spec:
  image: emqx/emqx:5.3.2
...

IMPORTANT: To fix the error, we had to completely wipe out our emqx including volumes (pvc) and let the operator recreate it from scratch. Now the operator is working properly and our cluster too.

Thanks.