emqx / emqx-operator

A Kubernetes Operator for EMQX
https://www.emqx.com
Apache License 2.0
209 stars 64 forks source link

Operator crashes when adding a new instance #900

Closed agronholm closed 1 year ago

agronholm commented 1 year ago

Describe the bug

When trying to create a new instance by creating a v2beta1 manifest fails:

$ kubectl create -f emqx.yaml 
Error from server (InternalError): error when creating "emqx.yaml": Internal error occurred: failed calling webhook "mutating.apps.emqx.io": failed to call webhook: Post "https://emqx-operator-webhook-service.emqx-operator.svc:443/mutate-apps-emqx-io-v2beta1-emqx?timeout=10s": EOF
Logs from the operator ``` {"level":"info","ts":"2023-08-01T14:13:45Z","logger":"controller-runtime.metrics","msg":"Metrics server is starting to listen","addr":":8080"} {"level":"info","ts":"2023-08-01T14:13:45Z","logger":"controller-runtime.builder","msg":"Registering a mutating webhook","GVK":"apps.emqx.io/v1beta4, Kind=EmqxPlugin","path":"/mutate-apps-emqx-io-v1beta4-emqxplugin"} {"level":"info","ts":"2023-08-01T14:13:45Z","logger":"controller-runtime.webhook","msg":"Registering webhook","path":"/mutate-apps-emqx-io-v1beta4-emqxplugin"} {"level":"info","ts":"2023-08-01T14:13:45Z","logger":"controller-runtime.builder","msg":"Registering a validating webhook","GVK":"apps.emqx.io/v1beta4, Kind=EmqxPlugin","path":"/validate-apps-emqx-io-v1beta4-emqxplugin"} {"level":"info","ts":"2023-08-01T14:13:45Z","logger":"controller-runtime.webhook","msg":"Registering webhook","path":"/validate-apps-emqx-io-v1beta4-emqxplugin"} {"level":"info","ts":"2023-08-01T14:13:45Z","logger":"controller-runtime.webhook","msg":"Registering webhook","path":"/convert"} {"level":"info","ts":"2023-08-01T14:13:45Z","logger":"controller-runtime.builder","msg":"Conversion webhook enabled","GVK":"apps.emqx.io/v1beta4, Kind=EmqxPlugin"} {"level":"info","ts":"2023-08-01T14:13:45Z","logger":"controller-runtime.builder","msg":"Registering a mutating webhook","GVK":"apps.emqx.io/v1beta4, Kind=EmqxBroker","path":"/mutate-apps-emqx-io-v1beta4-emqxbroker"} {"level":"info","ts":"2023-08-01T14:13:45Z","logger":"controller-runtime.webhook","msg":"Registering webhook","path":"/mutate-apps-emqx-io-v1beta4-emqxbroker"} {"level":"info","ts":"2023-08-01T14:13:45Z","logger":"controller-runtime.builder","msg":"Registering a validating webhook","GVK":"apps.emqx.io/v1beta4, Kind=EmqxBroker","path":"/validate-apps-emqx-io-v1beta4-emqxbroker"} {"level":"info","ts":"2023-08-01T14:13:45Z","logger":"controller-runtime.webhook","msg":"Registering webhook","path":"/validate-apps-emqx-io-v1beta4-emqxbroker"} {"level":"info","ts":"2023-08-01T14:13:45Z","logger":"controller-runtime.builder","msg":"Conversion webhook enabled","GVK":"apps.emqx.io/v1beta4, Kind=EmqxBroker"} {"level":"info","ts":"2023-08-01T14:13:45Z","logger":"controller-runtime.builder","msg":"Registering a mutating webhook","GVK":"apps.emqx.io/v1beta4, Kind=EmqxEnterprise","path":"/mutate-apps-emqx-io-v1beta4-emqxenterprise"} {"level":"info","ts":"2023-08-01T14:13:45Z","logger":"controller-runtime.webhook","msg":"Registering webhook","path":"/mutate-apps-emqx-io-v1beta4-emqxenterprise"} {"level":"info","ts":"2023-08-01T14:13:45Z","logger":"controller-runtime.builder","msg":"Registering a validating webhook","GVK":"apps.emqx.io/v1beta4, Kind=EmqxEnterprise","path":"/validate-apps-emqx-io-v1beta4-emqxenterprise"} {"level":"info","ts":"2023-08-01T14:13:45Z","logger":"controller-runtime.webhook","msg":"Registering webhook","path":"/validate-apps-emqx-io-v1beta4-emqxenterprise"} {"level":"info","ts":"2023-08-01T14:13:45Z","logger":"controller-runtime.builder","msg":"Conversion webhook enabled","GVK":"apps.emqx.io/v1beta4, Kind=EmqxEnterprise"} {"level":"info","ts":"2023-08-01T14:13:45Z","logger":"controller-runtime.builder","msg":"Registering a mutating webhook","GVK":"apps.emqx.io/v2beta1, Kind=EMQX","path":"/mutate-apps-emqx-io-v2beta1-emqx"} {"level":"info","ts":"2023-08-01T14:13:45Z","logger":"controller-runtime.webhook","msg":"Registering webhook","path":"/mutate-apps-emqx-io-v2beta1-emqx"} {"level":"info","ts":"2023-08-01T14:13:45Z","logger":"controller-runtime.builder","msg":"Registering a validating webhook","GVK":"apps.emqx.io/v2beta1, Kind=EMQX","path":"/validate-apps-emqx-io-v2beta1-emqx"} {"level":"info","ts":"2023-08-01T14:13:45Z","logger":"controller-runtime.webhook","msg":"Registering webhook","path":"/validate-apps-emqx-io-v2beta1-emqx"} {"level":"info","ts":"2023-08-01T14:13:45Z","logger":"controller-runtime.builder","msg":"Conversion webhook enabled","GVK":"apps.emqx.io/v2beta1, Kind=EMQX"} {"level":"info","ts":"2023-08-01T14:13:45Z","logger":"controller-runtime.builder","msg":"skip registering a mutating webhook, object does not implement admission.Defaulter or WithDefaulter wasn't called","GVK":"apps.emqx.io/v2beta1, Kind=Rebalance"} {"level":"info","ts":"2023-08-01T14:13:45Z","logger":"controller-runtime.builder","msg":"Registering a validating webhook","GVK":"apps.emqx.io/v2beta1, Kind=Rebalance","path":"/validate-apps-emqx-io-v2beta1-rebalance"} {"level":"info","ts":"2023-08-01T14:13:45Z","logger":"controller-runtime.webhook","msg":"Registering webhook","path":"/validate-apps-emqx-io-v2beta1-rebalance"} {"level":"info","ts":"2023-08-01T14:13:45Z","logger":"controller-runtime.builder","msg":"Conversion webhook enabled","GVK":"apps.emqx.io/v2beta1, Kind=Rebalance"} {"level":"info","ts":"2023-08-01T14:13:45Z","logger":"setup","msg":"starting manager"} {"level":"info","ts":"2023-08-01T14:13:45Z","logger":"controller-runtime.webhook.webhooks","msg":"Starting webhook server"} {"level":"info","ts":"2023-08-01T14:13:45Z","logger":"controller-runtime.certwatcher","msg":"Updated current TLS certificate"} {"level":"info","ts":"2023-08-01T14:13:45Z","logger":"controller-runtime.webhook","msg":"Serving webhook server","host":"","port":9443} {"level":"info","ts":"2023-08-01T14:13:45Z","logger":"controller-runtime.certwatcher","msg":"Starting certificate watcher"} {"level":"info","ts":"2023-08-01T14:13:45Z","msg":"Starting server","kind":"health probe","addr":"[::]:8081"} I0801 14:13:45.162419 1 leaderelection.go:248] attempting to acquire leader lease emqx-operator/19fd6fcc.emqx.io... {"level":"info","ts":"2023-08-01T14:13:45Z","msg":"Starting server","path":"/metrics","kind":"metrics","addr":"[::]:8080"} I0801 14:13:45.183287 1 leaderelection.go:258] successfully acquired lease emqx-operator/19fd6fcc.emqx.io {"level":"info","ts":"2023-08-01T14:13:45Z","msg":"Starting EventSource","controller":"emqxbroker","controllerGroup":"apps.emqx.io","controllerKind":"EmqxBroker","source":"kind source: *v1beta4.EmqxBroker"} {"level":"info","ts":"2023-08-01T14:13:45Z","msg":"Starting Controller","controller":"emqxbroker","controllerGroup":"apps.emqx.io","controllerKind":"EmqxBroker"} {"level":"info","ts":"2023-08-01T14:13:45Z","msg":"Starting EventSource","controller":"emqxplugin","controllerGroup":"apps.emqx.io","controllerKind":"EmqxPlugin","source":"kind source: *v1beta4.EmqxPlugin"} {"level":"info","ts":"2023-08-01T14:13:45Z","msg":"Starting Controller","controller":"emqxplugin","controllerGroup":"apps.emqx.io","controllerKind":"EmqxPlugin"} {"level":"info","ts":"2023-08-01T14:13:45Z","msg":"Starting EventSource","controller":"emqxenterprise","controllerGroup":"apps.emqx.io","controllerKind":"EmqxEnterprise","source":"kind source: *v1beta4.EmqxEnterprise"} {"level":"info","ts":"2023-08-01T14:13:45Z","msg":"Starting Controller","controller":"emqxenterprise","controllerGroup":"apps.emqx.io","controllerKind":"EmqxEnterprise"} {"level":"info","ts":"2023-08-01T14:13:45Z","msg":"Starting EventSource","controller":"emqx","controllerGroup":"apps.emqx.io","controllerKind":"EMQX","source":"kind source: *v2beta1.EMQX"} {"level":"info","ts":"2023-08-01T14:13:45Z","msg":"Starting Controller","controller":"emqx","controllerGroup":"apps.emqx.io","controllerKind":"EMQX"} {"level":"info","ts":"2023-08-01T14:13:45Z","msg":"Starting EventSource","controller":"rebalance","controllerGroup":"apps.emqx.io","controllerKind":"Rebalance","source":"kind source: *v2beta1.Rebalance"} {"level":"info","ts":"2023-08-01T14:13:45Z","msg":"Starting Controller","controller":"rebalance","controllerGroup":"apps.emqx.io","controllerKind":"Rebalance"} {"level":"info","ts":"2023-08-01T14:13:46Z","msg":"Starting workers","controller":"emqxenterprise","controllerGroup":"apps.emqx.io","controllerKind":"EmqxEnterprise","worker count":1} {"level":"info","ts":"2023-08-01T14:13:46Z","msg":"Starting workers","controller":"emqxplugin","controllerGroup":"apps.emqx.io","controllerKind":"EmqxPlugin","worker count":1} {"level":"info","ts":"2023-08-01T14:13:46Z","msg":"Starting workers","controller":"rebalance","controllerGroup":"apps.emqx.io","controllerKind":"Rebalance","worker count":1} {"level":"info","ts":"2023-08-01T14:13:46Z","msg":"Starting workers","controller":"emqxbroker","controllerGroup":"apps.emqx.io","controllerKind":"EmqxBroker","worker count":1} {"level":"info","ts":"2023-08-01T14:13:46Z","msg":"Starting workers","controller":"emqx","controllerGroup":"apps.emqx.io","controllerKind":"EMQX","worker count":1} {"level":"info","ts":"2023-08-02T10:30:34Z","logger":"emqx-resource","msg":"default","name":"emqx"} 2023/08/02 10:30:34 http: panic serving 10.240.0.154:46866: runtime error: invalid memory address or nil pointer dereference goroutine 132210 [running]: net/http.(*conn).serve.func1() /usr/local/go/src/net/http/server.go:1854 +0xbf panic({0x1803640, 0x28b3a70}) /usr/local/go/src/runtime/panic.go:890 +0x263 github.com/rory-z/go-hocon.(*Config).Get(0x0, {0x1a41184, 0x1d}) /go/pkg/mod/github.com/rory-z/go-hocon@v1.2.15-1/config.go:252 +0x27 github.com/rory-z/go-hocon.(*Config).GetString(0xc000a64400?, {0x1a41184?, 0x0?}) /go/pkg/mod/github.com/rory-z/go-hocon@v1.2.15-1/config.go:133 +0x1e github.com/emqx/emqx-operator/apis/apps/v2beta1.(*EMQX).defaultConfiguration(0xc000647500) /workspace/apis/apps/v2beta1/emqx_webhook.go:200 +0x45 github.com/emqx/emqx-operator/apis/apps/v2beta1.(*EMQX).Default(0xc000647500) /workspace/apis/apps/v2beta1/emqx_webhook.go:55 +0xdc sigs.k8s.io/controller-runtime/pkg/webhook/admission.(*mutatingHandler).Handle(_, {_, _}, {{{0xc000298870, 0x24}, {{0xc000527e60, 0xc}, {0xc000527e70, 0x7}, {0xc000527e78, ...}}, ...}}) /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.3/pkg/webhook/admission/defaulter.go:66 +0x1f9 sigs.k8s.io/controller-runtime/pkg/webhook/admission.(*Webhook).Handle(_, {_, _}, {{{0xc000298870, 0x24}, {{0xc000527e60, 0xc}, {0xc000527e70, 0x7}, {0xc000527e78, ...}}, ...}}) /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.3/pkg/webhook/admission/webhook.go:146 +0xa2 sigs.k8s.io/controller-runtime/pkg/webhook/admission.(*Webhook).ServeHTTP(0xc00003fcc0, {0x7f6fad4bfd60?, 0xc0002a2be0}, 0xc000a50f00) /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.3/pkg/webhook/admission/http.go:98 +0xeb5 github.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerInFlight.func1({0x7f6fad4bfd60, 0xc0002a2be0}, 0x1c82300?) /go/pkg/mod/github.com/prometheus/client_golang@v1.13.0/prometheus/promhttp/instrument_server.go:56 +0xd4 net/http.HandlerFunc.ServeHTTP(0x1c82380?, {0x7f6fad4bfd60?, 0xc0002a2be0?}, 0x70fbe0?) /usr/local/go/src/net/http/server.go:2122 +0x2f github.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerCounter.func1({0x1c82380?, 0xc00017a8c0?}, 0xc000a50f00) /go/pkg/mod/github.com/prometheus/client_golang@v1.13.0/prometheus/promhttp/instrument_server.go:142 +0xb8 net/http.HandlerFunc.ServeHTTP(0xc00017a8c0?, {0x1c82380?, 0xc00017a8c0?}, 0xc000a5ce80?) /usr/local/go/src/net/http/server.go:2122 +0x2f github.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerDuration.func2({0x1c82380, 0xc00017a8c0}, 0xc000a50f00) /go/pkg/mod/github.com/prometheus/client_golang@v1.13.0/prometheus/promhttp/instrument_server.go:104 +0xbf net/http.HandlerFunc.ServeHTTP(0xc00017a8c0?, {0x1c82380?, 0xc00017a8c0?}, 0x1a29067?) /usr/local/go/src/net/http/server.go:2122 +0x2f net/http.(*ServeMux).ServeHTTP(0xc000a5ce67?, {0x1c82380, 0xc00017a8c0}, 0xc000a50f00) /usr/local/go/src/net/http/server.go:2500 +0x149 net/http.serverHandler.ServeHTTP({0x1c756d0?}, {0x1c82380, 0xc00017a8c0}, 0xc000a50f00) /usr/local/go/src/net/http/server.go:2936 +0x316 net/http.(*conn).serve(0xc00045afc0, {0x1c83210, 0xc000742240}) /usr/local/go/src/net/http/server.go:1995 +0x612 created by net/http.(*Server).Serve /usr/local/go/src/net/http/server.go:3089 +0x5ed {"level":"info","ts":"2023-08-02T10:36:10Z","logger":"emqx-resource","msg":"default","name":"emqx"} 2023/08/02 10:36:10 http: panic serving 10.240.0.41:47486: runtime error: invalid memory address or nil pointer dereference goroutine 132810 [running]: net/http.(*conn).serve.func1() /usr/local/go/src/net/http/server.go:1854 +0xbf panic({0x1803640, 0x28b3a70}) /usr/local/go/src/runtime/panic.go:890 +0x263 github.com/rory-z/go-hocon.(*Config).Get(0x0, {0x1a41184, 0x1d}) /go/pkg/mod/github.com/rory-z/go-hocon@v1.2.15-1/config.go:252 +0x27 github.com/rory-z/go-hocon.(*Config).GetString(0xc00083bc00?, {0x1a41184?, 0x0?}) /go/pkg/mod/github.com/rory-z/go-hocon@v1.2.15-1/config.go:133 +0x1e github.com/emqx/emqx-operator/apis/apps/v2beta1.(*EMQX).defaultConfiguration(0xc000243500) /workspace/apis/apps/v2beta1/emqx_webhook.go:200 +0x45 github.com/emqx/emqx-operator/apis/apps/v2beta1.(*EMQX).Default(0xc000243500) /workspace/apis/apps/v2beta1/emqx_webhook.go:55 +0xdc sigs.k8s.io/controller-runtime/pkg/webhook/admission.(*mutatingHandler).Handle(_, {_, _}, {{{0xc000983920, 0x24}, {{0xc000791680, 0xc}, {0xc000791658, 0x7}, {0xc000791690, ...}}, ...}}) /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.3/pkg/webhook/admission/defaulter.go:66 +0x1f9 sigs.k8s.io/controller-runtime/pkg/webhook/admission.(*Webhook).Handle(_, {_, _}, {{{0xc000983920, 0x24}, {{0xc000791680, 0xc}, {0xc000791658, 0x7}, {0xc000791690, ...}}, ...}}) /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.3/pkg/webhook/admission/webhook.go:146 +0xa2 sigs.k8s.io/controller-runtime/pkg/webhook/admission.(*Webhook).ServeHTTP(0xc00003fcc0, {0x7f6fad4bfd60?, 0xc0000bbcc0}, 0xc000844500) /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.3/pkg/webhook/admission/http.go:98 +0xeb5 github.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerInFlight.func1({0x7f6fad4bfd60, 0xc0000bbcc0}, 0x1c82300?) /go/pkg/mod/github.com/prometheus/client_golang@v1.13.0/prometheus/promhttp/instrument_server.go:56 +0xd4 net/http.HandlerFunc.ServeHTTP(0x1c82380?, {0x7f6fad4bfd60?, 0xc0000bbcc0?}, 0x70fbe0?) /usr/local/go/src/net/http/server.go:2122 +0x2f github.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerCounter.func1({0x1c82380?, 0xc00017a8c0?}, 0xc000844500) /go/pkg/mod/github.com/prometheus/client_golang@v1.13.0/prometheus/promhttp/instrument_server.go:142 +0xb8 net/http.HandlerFunc.ServeHTTP(0xc00017a8c0?, {0x1c82380?, 0xc00017a8c0?}, 0xc000847040?) /usr/local/go/src/net/http/server.go:2122 +0x2f github.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerDuration.func2({0x1c82380, 0xc00017a8c0}, 0xc000844500) /go/pkg/mod/github.com/prometheus/client_golang@v1.13.0/prometheus/promhttp/instrument_server.go:104 +0xbf net/http.HandlerFunc.ServeHTTP(0xc00017a8c0?, {0x1c82380?, 0xc00017a8c0?}, 0x1a2df63?) /usr/local/go/src/net/http/server.go:2122 +0x2f net/http.(*ServeMux).ServeHTTP(0xc000847027?, {0x1c82380, 0xc00017a8c0}, 0xc000844500) /usr/local/go/src/net/http/server.go:2500 +0x149 net/http.serverHandler.ServeHTTP({0x1c756d0?}, {0x1c82380, 0xc00017a8c0}, 0xc000844500) /usr/local/go/src/net/http/server.go:2936 +0x316 net/http.(*conn).serve(0xc000694a20, {0x1c83210, 0xc000742240}) /usr/local/go/src/net/http/server.go:1995 +0x612 created by net/http.(*Server).Serve /usr/local/go/src/net/http/server.go:3089 +0x5ed ```

To Reproduce Steps to reproduce the behavior:

Create a file with the following contents:

emqx.yaml ```yaml apiVersion: apps.emqx.io/v2beta1 kind: EMQX metadata: name: emqx namespace: emqx spec: image: emqx:5.1.3 coreTemplate: spec: replicas: 1 config: data: | emqx_loaded_plugins = emqx_auth_http,emqx_dashboard emqx_cluster.discovery_strategy = manual emqx_listeners.ssl.default.ssl_options.versions = ["tlsv1.3", "tlsv1.2"] emqx_listeners.wss.default.ssl_options.versions = ["tlsv1.3", "tlsv1.2"] emqx_authentication.1 = { mechanism = "password_based", backend = "http", method = "post", enable = true url = "http://web.eb:8800/v1/integrations/mqtt/authenticate", body = "{\"username\": \"${username}\", \"cert_subject\": \"${cert_subject}\", \"password\": \"${password}\", \"peerhost\": \"${peerhost}\"}" } emqx_authorization.no_match = "deny" emqx_authorization.cache_max_size = 1024 emqx_authorization.sources.1 = { type = "http", method = "post", enable = true, url = "http://web.eb:8800/v1/integrations/mqtt/authorize", body ="{\"username\": \"${username}\", \"cert_subject\": \"${cert_subject}\", \"topic\": \"${topic}\", \"action\": \"${action}\"}" } ```

Expected behavior

A new EMQX instance should be spun up

Anything else we need to know?:

Environment details::

Rory-Z commented 1 year ago

Please running kubectl wait --for=condition=Ready pods -l "control-plane=controller-manager" -n emqx-operator-system and retry

agronholm commented 1 year ago

That just waits until the operator pod is ready, right? The pod was ready long before I tried creating the instance. But yes, I tried what you suggest now, and it made no difference.

Rory-Z commented 1 year ago

So weird, could you please check the endpoints of the emqx-operator-webhook-service ?

agronholm commented 1 year ago

What exactly would you have me do?

Rory-Z commented 1 year ago

Easy, please, could you check the endpoints of the emqx-operator-webhook-service ? I want know the pod address of emqx-operator-controller-manager is in the .subsets.addresses of the endpoints

agronholm commented 1 year ago

Sorry for being obtuse, by what do you mean by "checking"? I have no clue what paths are exposed by https://emqx-operator-webhook-service.emqx-operator.svc. The root path (/) returned a 404.

agronholm commented 1 year ago

Sorry, I didn't properly read your previous message. So here it is: image And yes, that is the pod address.

agronholm commented 1 year ago

The operator seems to have a problem with the authentication/authorization section of the config. If I comment them out (and the replicas section), the instance is created successfully.

agronholm commented 1 year ago

I further narrowed this down to the authorization.sources.1 section. If I comment that out, the instance is created successfully.

agronholm commented 1 year ago

So I was able to make kubectl create pass with the following config:

emqx.yaml ```yaml apiVersion: apps.emqx.io/v2beta1 kind: EMQX metadata: name: emqx namespace: emqx spec: image: emqx:5.1.3 config: data: | loaded_plugins = ["auth_http", "dashboard"] cluster.discovery_strategy = manual listeners.ssl.default.ssl_options.versions = ["tlsv1.3", "tlsv1.2"] listeners.wss.default.ssl_options.versions = ["tlsv1.3", "tlsv1.2"] authentication.1 { mechanism = password_based backend = http method = post enable = true url = "http://web.eb.svc:8800/v1/integrations/mqtt/authenticate" body = { username = "${username}" cert_subject = "${cert_subject}" password = "${password}" peerhost = "${peerhost}" } } authorization { sources = [ { type = http method = post enable = true url = "http://web.eb.svc:8800/v1/integrations/mqtt/authorize" body = { username = "${username}" cert_subject = "${cert_subject}" password = "${password}" peerhost = "${peerhost}" topic = "${topic}" action = "${action}" } } ] no_match = deny cache_max_size = 1024 } ```

Now, however, the pods won't be created, and the operator pod contains a lot of the following errors:

{"level":"error","ts":"2023-08-02T14:39:48Z","msg":"Reconciler error","controller":"emqx","controllerGroup":"apps.emqx.io","controllerKind":"EMQX","eMQX":{"name":"emqx","namespace":"emqx"},"namespace":"emqx","name":"emqx","reconcileID":"d4f255c3-26de-4707-9b1e-59f2a258b545","error":"failed to create or update services: failed to create Service emqx-listeners: Service \"emqx-listeners\" is invalid: [spec.ports[0].port: Invalid value: 0: must be between 1 and 65535, inclusive, spec.ports[0].targetPort: Invalid value: 0: must be between 1 and 65535, inclusive]","errorVerbose":"Service \"emqx-listeners\" is invalid: [spec.ports[0].port: Invalid value: 0: must be between 1 and 65535, inclusive, spec.ports[0].targetPort: Invalid value: 0: must be between 1 and 65535, inclusive]\nfailed to create Service emqx-listeners\ngithub.com/emqx/emqx-operator/internal/handler.(*Handler).Create\n\t/workspace/internal/handler/handler.go:135\ngithub.com/emqx/emqx-operator/internal/handler.(*Handler).CreateOrUpdate\n\t/workspace/internal/handler/handler.go:74\ngithub.com/emqx/emqx-operator/internal/handler.(*Handler).CreateOrUpdateList\n\t/workspace/internal/handler/handler.go:60\ngithub.com/emqx/emqx-operator/controllers/apps/v2beta1.(*addSvc).reconcile\n\t/workspace/controllers/apps/v2beta1/add_svc.go:44\ngithub.com/emqx/emqx-operator/controllers/apps/v2beta1.(*EMQXReconciler).Reconcile\n\t/workspace/controllers/apps/v2beta1/emqx_controller.go:125\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.3/pkg/internal/controller/controller.go:121\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.3/pkg/internal/controller/controller.go:320\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.3/pkg/internal/controller/controller.go:273\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.3/pkg/internal/controller/controller.go:234\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1598\nfailed to create or update services\ngithub.com/emqx/emqx-operator/controllers/apps/v2beta1.(*addSvc).reconcile\n\t/workspace/controllers/apps/v2beta1/add_svc.go:45\ngithub.com/emqx/emqx-operator/controllers/apps/v2beta1.(*EMQXReconciler).Reconcile\n\t/workspace/controllers/apps/v2beta1/emqx_controller.go:125\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.3/pkg/internal/controller/controller.go:121\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.3/pkg/internal/controller/controller.go:320\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.3/pkg/internal/controller/controller.go:273\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.3/pkg/internal/controller/controller.go:234\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1598","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.3/pkg/internal/controller/controller.go:273\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.3/pkg/internal/controller/controller.go:234"}
agronholm commented 1 year ago

Ok, I figured it out. Turns out that the biggest problem was that I needed to explicitly specify the address to bind to for both listeners. Here's the config that actually worked:

emqx.yaml ```yaml apiVersion: apps.emqx.io/v2beta1 kind: EMQX metadata: name: emqx namespace: emqx spec: image: emqx:5.1.3 config: data: | loaded_plugins = ["auth_http", "dashboard"] cluster.discovery_strategy = manual listeners.ssl.default { bind = "0.0.0.0:8883" max_connections = 1024000 ssl_options.versions = ["tlsv1.3", "tlsv1.2"] } listeners.wss.default { bind = "0.0.0.0:8084" max_connections = 1024000 websocket.mqtt_path = "/mqtt" ssl_options.versions = ["tlsv1.3", "tlsv1.2"] } authentication = [ { mechanism = password_based backend = http method = post enable = true url = "http://web.eb.svc:8800/v1/integrations/mqtt/authenticate" body = { username = "${username}" cert_subject = "${cert_subject}" password = "${password}" peerhost = "${peerhost}" } } ] authorization { sources = [ { type = http method = post enable = true url = "http://web.eb.svc:8800/v1/integrations/mqtt/authorize" body = { username = "${username}" cert_subject = "${cert_subject}" password = "${password}" peerhost = "${peerhost}" topic = "${topic}" action = "${action}" } } ] no_match = deny cache { max_size = 1024 } } ```
agronholm commented 1 year ago

To summarize, I think the problem was caused by me trying to directly configure the authorizer source at slot 1 where no authorizers were present in the default config. The operator then tried to access that index which caused a memory access violation. Would you agree with my assessment?

Rory-Z commented 1 year ago

I agree, but I found the cluster.discovery_strategy = manual in your config, the EMQX Operator will overwrite this config, it will use cluster.discovery_strategy = dns, you can check the env in EMQX Pod

agronholm commented 1 year ago

Ah, that is a remnant of the configuration I used previously while deploying EMQX directly via its Helm chart.