EMQX-Cluster not working in IPV6 only network

axkng commented 2 years ago

Describe the bug After following the getting-started page to setup the emqx-operator I provisioned a emqx-cluster. The pods start and are running, but the status commands return errors:

kubectl exec -n emqx -it emqx-0 -c emqx -- emqx_ctl status
Node 'emqx@emqx-0.emqx-headless.emqx.svc.cluster.local' not responding to pings.
/opt/emqx/bin/emqx: line 46: die: command not found
command terminated with exit code 127

To Reproduce Steps to reproduce the behavior:

Deploy the operator to a EKS cluster with Kubernetes 1.22.9
Deploy a simple broker (can be without persistence, I tested that.)
Check the output of the status commands and get errors.

Expected behavior Not to get errors on the status commands after provisioning a simple broker with no config.

Anything else we need to know?:

Environment details::

Kubernetes version: 1.22.9
Cloud-provider/provisioner: AWS EKS
emqx-operator version: 1.2.4

Install method: helm, emqx deployed as crd emqx-manifest:

---
apiVersion: apps.emqx.io/v1beta3
kind: EmqxBroker
metadata:
name: emqx
labels:
app: emqx
environment: dev
spec:
persistent:
accessModes:
  - ReadWriteOnce
storageClassName: ebs-gp3
resources:
  requests:
    storage: 1Gi
emqxTemplate:
image: emqx/emqx:4.4.6

Did I do something wrong here?

Rory-Z commented 2 years ago

Hi, @Furragen Could you please show emqx-operator logs and emqx custom resource status? run the following command kubectl get EmqxBroker emqx -o json | jq '.status' kubectl logs -f -l "control-plane=controller-manager" -n emqx-operator-system -c manager --tail=100

Rory-Z commented 2 years ago

And the emqx pod logs kubectl logs emqx-0 -c emqx

axkng commented 2 years ago

Hi @Rory-Z , thanks for your quick response.

kubectl get -n emqx EmqxBroker emqx -o json | jq '.status'
{
  "conditions": [
    {
      "lastTransitionTime": "2022-08-10T07:04:09Z",
      "lastUpdateTime": "2022-08-10T07:26:23Z",
      "message": "Some nodes are not ready",
      "reason": "ClusterNotReady",
      "status": "False",
      "type": "Running"
    },
    {
      "lastTransitionTime": "2022-08-10T07:03:26Z",
      "lastUpdateTime": "2022-08-10T07:03:26Z",
      "message": "All default plugins initialized",
      "reason": "PluginInitializeSuccessfully",
      "status": "True",
      "type": "PluginInitialized"
    }
  ],
  "emqxNodes": [
    {
      "node": "emqx@emqx-0.emqx-headless.emqx.svc.cluster.local",
      "node_status": "Running",
      "otp_release": "24.1.5/12.1.5",
      "version": "4.4.6"
    }
  ],
  "readyReplicas": 1,
  "replicas": 3
}

Logs of the operator ( kubectl logs -f -l "control-plane=controller-manager" -n emqx -c manager --tail=100)

Logs

E0810 07:03:59.570352 1 portforward.go:234] lost connection to pod E0810 07:03:59.838997 1 portforward.go:406] an error occurred forwarding 38417 -> 8081: error forwarding port 8081 to pod 0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435, uid : failed to execute portforward in network namespace "/var/run/netns/cni-29540245-853e-246f-ed7f-0443a91d3642": failed to connect to localhost:8081 inside namespace "0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435", IPv4: dial tcp4 127.0.0.1:8081: connect: connection refused IPv6 dial tcp6 [::1]:8081: connect: connection refused E0810 07:04:00.358323 1 portforward.go:406] an error occurred forwarding 43423 -> 8081: error forwarding port 8081 to pod 0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435, uid : failed to execute portforward in network namespace "/var/run/netns/cni-29540245-853e-246f-ed7f-0443a91d3642": failed to connect to localhost:8081 inside namespace "0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435", IPv4: dial tcp4 127.0.0.1:8081: connect: connection refused IPv6 dial tcp6 [::1]:8081: connect: connection refused E0810 07:04:00.358653 1 portforward.go:234] lost connection to pod 1.660115040377564e+09 ERROR Reconciler error {"controller": "emqxbroker", "controllerGroup": "apps.emqx.io", "controllerKind": "EmqxBroker", "emqxBroker": {"name":"emqx","namespace":"emqx"}, "namespace": "emqx", "name": "emqx", "reconcileID": "d0295269-0001-4c19-ae4d-2be9e74a7321", "error": "Operation cannot be fulfilled on emqxbrokers.apps.emqx.io \"emqx\": the object has been modified; please apply your changes to the latest version and try again"} sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.2/pkg/internal/controller/controller.go:273 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2 /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.2/pkg/internal/controller/controller.go:234 E0810 07:04:00.653062 1 portforward.go:406] an error occurred forwarding 46363 -> 8081: error forwarding port 8081 to pod 0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435, uid : failed to execute portforward in network namespace "/var/run/netns/cni-29540245-853e-246f-ed7f-0443a91d3642": failed to connect to localhost:8081 inside namespace "0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435", IPv4: dial tcp4 127.0.0.1:8081: connect: connection refused IPv6 dial tcp6 [::1]:8081: connect: connection refused E0810 07:04:00.656235 1 portforward.go:234] lost connection to pod 1.6601150413087435e+09 ERROR Reconciler error {"controller": "emqxbroker", "controllerGroup": "apps.emqx.io", "controllerKind": "EmqxBroker", "emqxBroker": {"name":"emqx","namespace":"emqx"}, "namespace": "emqx", "name": "emqx", "reconcileID": "aed5651b-c774-4496-8e04-41ec215aeb76", "error": "Operation cannot be fulfilled on emqxbrokers.apps.emqx.io \"emqx\": the object has been modified; please apply your changes to the latest version and try again"} sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.2/pkg/internal/controller/controller.go:273 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2 /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.2/pkg/internal/controller/controller.go:234 E0810 07:04:01.933473 1 portforward.go:406] an error occurred forwarding 36141 -> 8081: error forwarding port 8081 to pod 0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435, uid : failed to execute portforward in network namespace "/var/run/netns/cni-29540245-853e-246f-ed7f-0443a91d3642": failed to connect to localhost:8081 inside namespace "0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435", IPv4: dial tcp4 127.0.0.1:8081: connect: connection refused IPv6 dial tcp6 [::1]:8081: connect: connection refused E0810 07:04:01.933924 1 portforward.go:234] lost connection to pod 1.6601150419643033e+09 ERROR Reconciler error {"controller": "emqxbroker", "controllerGroup": "apps.emqx.io", "controllerKind": "EmqxBroker", "emqxBroker": {"name":"emqx","namespace":"emqx"}, "namespace": "emqx", "name": "emqx", "reconcileID": "7207ec58-c48f-4f47-bb51-d156051a2e78", "error": "Operation cannot be fulfilled on emqxbrokers.apps.emqx.io \"emqx\": the object has been modified; please apply your changes to the latest version and try again"} sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.2/pkg/internal/controller/controller.go:273 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2 /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.2/pkg/internal/controller/controller.go:234 E0810 07:04:02.226753 1 portforward.go:406] an error occurred forwarding 36151 -> 8081: error forwarding port 8081 to pod 0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435, uid : failed to execute portforward in network namespace "/var/run/netns/cni-29540245-853e-246f-ed7f-0443a91d3642": failed to connect to localhost:8081 inside namespace "0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435", IPv4: dial tcp4 127.0.0.1:8081: connect: connection refused IPv6 dial tcp6 [::1]:8081: connect: connection refused E0810 07:04:02.227081 1 portforward.go:234] lost connection to pod E0810 07:04:02.639584 1 portforward.go:406] an error occurred forwarding 39885 -> 8081: error forwarding port 8081 to pod 0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435, uid : failed to execute portforward in network namespace "/var/run/netns/cni-29540245-853e-246f-ed7f-0443a91d3642": failed to connect to localhost:8081 inside namespace "0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435", IPv4: dial tcp4 127.0.0.1:8081: connect: connection refused IPv6 dial tcp6 [::1]:8081: connect: connection refused E0810 07:04:02.639792 1 portforward.go:234] lost connection to pod 1.6601150426838543e+09 ERROR Reconciler error {"controller": "emqxbroker", "controllerGroup": "apps.emqx.io", "controllerKind": "EmqxBroker", "emqxBroker": {"name":"emqx","namespace":"emqx"}, "namespace": "emqx", "name": "emqx", "reconcileID": "4ef8f472-14f7-4da7-849b-5220115b9dbc", "error": "Operation cannot be fulfilled on emqxbrokers.apps.emqx.io \"emqx\": the object has been modified; please apply your changes to the latest version and try again"} sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.2/pkg/internal/controller/controller.go:273 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2 /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.2/pkg/internal/controller/controller.go:234 E0810 07:04:02.948658 1 portforward.go:406] an error occurred forwarding 40737 -> 8081: error forwarding port 8081 to pod 0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435, uid : failed to execute portforward in network namespace "/var/run/netns/cni-29540245-853e-246f-ed7f-0443a91d3642": failed to connect to localhost:8081 inside namespace "0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435", IPv4: dial tcp4 127.0.0.1:8081: connect: connection refused IPv6 dial tcp6 [::1]:8081: connect: connection refused E0810 07:04:02.948869 1 portforward.go:234] lost connection to pod E0810 07:04:03.425248 1 portforward.go:406] an error occurred forwarding 34645 -> 8081: error forwarding port 8081 to pod 0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435, uid : failed to execute portforward in network namespace "/var/run/netns/cni-29540245-853e-246f-ed7f-0443a91d3642": failed to connect to localhost:8081 inside namespace "0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435", IPv4: dial tcp4 127.0.0.1:8081: connect: connection refused IPv6 dial tcp6 [::1]:8081: connect: connection refused E0810 07:04:03.425551 1 portforward.go:234] lost connection to pod 1.660115043447316e+09 ERROR Reconciler error {"controller": "emqxbroker", "controllerGroup": "apps.emqx.io", "controllerKind": "EmqxBroker", "emqxBroker": {"name":"emqx","namespace":"emqx"}, "namespace": "emqx", "name": "emqx", "reconcileID": "7d86ef9a-e0f3-465b-b1e0-32123f7d2377", "error": "Operation cannot be fulfilled on emqxbrokers.apps.emqx.io \"emqx\": the object has been modified; please apply your changes to the latest version and try again"} sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.2/pkg/internal/controller/controller.go:273 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2 /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.2/pkg/internal/controller/controller.go:234 E0810 07:04:03.772788 1 portforward.go:406] an error occurred forwarding 37549 -> 8081: error forwarding port 8081 to pod 0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435, uid : failed to execute portforward in network namespace "/var/run/netns/cni-29540245-853e-246f-ed7f-0443a91d3642": failed to connect to localhost:8081 inside namespace "0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435", IPv4: dial tcp4 127.0.0.1:8081: connect: connection refused IPv6 dial tcp6 [::1]:8081: connect: connection refused E0810 07:04:03.773081 1 portforward.go:234] lost connection to pod 1.6601150441139083e+09 ERROR Reconciler error {"controller": "emqxbroker", "controllerGroup": "apps.emqx.io", "controllerKind": "EmqxBroker", "emqxBroker": {"name":"emqx","namespace":"emqx"}, "namespace": "emqx", "name": "emqx", "reconcileID": "300bc970-0ebd-4d1a-a101-6b073ec449e0", "error": "failed to update StatefulSet emqx: Operation cannot be fulfilled on statefulsets.apps \"emqx\": the object has been modified; please apply your changes to the latest version and try again"} sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.2/pkg/internal/controller/controller.go:273 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2 /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.2/pkg/internal/controller/controller.go:234 E0810 07:04:04.424648 1 portforward.go:406] an error occurred forwarding 32813 -> 8081: error forwarding port 8081 to pod 0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435, uid : failed to execute portforward in network namespace "/var/run/netns/cni-29540245-853e-246f-ed7f-0443a91d3642": failed to connect to localhost:8081 inside namespace "0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435", IPv4: dial tcp4 127.0.0.1:8081: connect: connection refused IPv6 dial tcp6 [::1]:8081: connect: connection refused E0810 07:04:04.425127 1 portforward.go:234] lost connection to pod E0810 07:04:04.909241 1 portforward.go:406] an error occurred forwarding 41449 -> 8081: error forwarding port 8081 to pod 0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435, uid : failed to execute portforward in network namespace "/var/run/netns/cni-29540245-853e-246f-ed7f-0443a91d3642": failed to connect to localhost:8081 inside namespace "0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435", IPv4: dial tcp4 127.0.0.1:8081: connect: connection refused IPv6 dial tcp6 [::1]:8081: connect: connection refused E0810 07:04:04.909421 1 portforward.go:234] lost connection to pod 1.660115044925919e+09 ERROR Reconciler error {"controller": "emqxbroker", "controllerGroup": "apps.emqx.io", "controllerKind": "EmqxBroker", "emqxBroker": {"name":"emqx","namespace":"emqx"}, "namespace": "emqx", "name": "emqx", "reconcileID": "e6941001-14b7-4462-8b90-80e6ab8feac4", "error": "Operation cannot be fulfilled on emqxbrokers.apps.emqx.io \"emqx\": the object has been modified; please apply your changes to the latest version and try again"} sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.2/pkg/internal/controller/controller.go:273 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2 /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.2/pkg/internal/controller/controller.go:234 E0810 07:04:05.216967 1 portforward.go:406] an error occurred forwarding 34665 -> 8081: error forwarding port 8081 to pod 0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435, uid : failed to execute portforward in network namespace "/var/run/netns/cni-29540245-853e-246f-ed7f-0443a91d3642": failed to connect to localhost:8081 inside namespace "0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435", IPv4: dial tcp4 127.0.0.1:8081: connect: connection refused IPv6 dial tcp6 [::1]:8081: connect: connection refused E0810 07:04:05.725675 1 portforward.go:406] an error occurred forwarding 41193 -> 8081: error forwarding port 8081 to pod 0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435, uid : failed to execute portforward in network namespace "/var/run/netns/cni-29540245-853e-246f-ed7f-0443a91d3642": failed to connect to localhost:8081 inside namespace "0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435", IPv4: dial tcp4 127.0.0.1:8081: connect: connection refused IPv6 dial tcp6 [::1]:8081: connect: connection refused E0810 07:04:05.726157 1 portforward.go:234] lost connection to pod 1.6601150457562108e+09 ERROR Reconciler error {"controller": "emqxbroker", "controllerGroup": "apps.emqx.io", "controllerKind": "EmqxBroker", "emqxBroker": {"name":"emqx","namespace":"emqx"}, "namespace": "emqx", "name": "emqx", "reconcileID": "aaa3c815-7409-4e44-b752-09cff2b0531e", "error": "Operation cannot be fulfilled on emqxbrokers.apps.emqx.io \"emqx\": the object has been modified; please apply your changes to the latest version and try again"} sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.2/pkg/internal/controller/controller.go:273 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2 /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.2/pkg/internal/controller/controller.go:234 E0810 07:04:06.034595 1 portforward.go:406] an error occurred forwarding 35435 -> 8081: error forwarding port 8081 to pod 862a1a59ff6fdc75b1c8a7520a2ed57d2720c341f7015556443b4063771ccdd4, uid : failed to execute portforward in network namespace "/var/run/netns/cni-4d298c69-db9b-1c7e-ebac-314710d61826": failed to connect to localhost:8081 inside namespace "862a1a59ff6fdc75b1c8a7520a2ed57d2720c341f7015556443b4063771ccdd4", IPv4: dial tcp4 127.0.0.1:8081: connect: connection refused IPv6 dial tcp6 [::1]:8081: connect: connection refused E0810 07:04:06.034845 1 portforward.go:234] lost connection to pod E0810 07:04:06.440777 1 portforward.go:406] an error occurred forwarding 33389 -> 8081: error forwarding port 8081 to pod 0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435, uid : failed to execute portforward in network namespace "/var/run/netns/cni-29540245-853e-246f-ed7f-0443a91d3642": failed to connect to localhost:8081 inside namespace "0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435", IPv4: dial tcp4 127.0.0.1:8081: connect: connection refused IPv6 dial tcp6 [::1]:8081: connect: connection refused E0810 07:04:06.441162 1 portforward.go:234] lost connection to pod E0810 07:04:06.735334 1 portforward.go:406] an error occurred forwarding 35219 -> 8081: error forwarding port 8081 to pod 0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435, uid : failed to execute portforward in network namespace "/var/run/netns/cni-29540245-853e-246f-ed7f-0443a91d3642": failed to connect to localhost:8081 inside namespace "0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435", IPv4: dial tcp4 127.0.0.1:8081: connect: connection refused IPv6 dial tcp6 [::1]:8081: connect: connection refused E0810 07:04:06.735726 1 portforward.go:234] lost connection to pod E0810 07:04:07.043941 1 portforward.go:406] an error occurred forwarding 34727 -> 8081: error forwarding port 8081 to pod 0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435, uid : failed to execute portforward in network namespace "/var/run/netns/cni-29540245-853e-246f-ed7f-0443a91d3642": failed to connect to localhost:8081 inside namespace "0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435", IPv4: dial tcp4 127.0.0.1:8081: connect: connection refused IPv6 dial tcp6 [::1]:8081: connect: connection refused E0810 07:04:07.044309 1 portforward.go:234] lost connection to pod E0810 07:04:07.450824 1 portforward.go:406] an error occurred forwarding 34031 -> 8081: error forwarding port 8081 to pod 0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435, uid : failed to execute portforward in network namespace "/var/run/netns/cni-29540245-853e-246f-ed7f-0443a91d3642": failed to connect to localhost:8081 inside namespace "0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435", IPv4: dial tcp4 127.0.0.1:8081: connect: connection refused IPv6 dial tcp6 [::1]:8081: connect: connection refused E0810 07:04:07.451169 1 portforward.go:234] lost connection to pod E0810 07:04:07.790990 1 portforward.go:406] an error occurred forwarding 43441 -> 8081: error forwarding port 8081 to pod 0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435, uid : failed to execute portforward in network namespace "/var/run/netns/cni-29540245-853e-246f-ed7f-0443a91d3642": failed to connect to localhost:8081 inside namespace "0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435", IPv4: dial tcp4 127.0.0.1:8081: connect: connection refused IPv6 dial tcp6 [::1]:8081: connect: connection refused E0810 07:04:07.791210 1 portforward.go:234] lost connection to pod E0810 07:04:08.188870 1 portforward.go:406] an error occurred forwarding 32839 -> 8081: error forwarding port 8081 to pod 0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435, uid : failed to execute portforward in network namespace "/var/run/netns/cni-29540245-853e-246f-ed7f-0443a91d3642": failed to connect to localhost:8081 inside namespace "0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435", IPv4: dial tcp4 127.0.0.1:8081: connect: connection refused IPv6 dial tcp6 [::1]:8081: connect: connection refused E0810 07:04:08.189418 1 portforward.go:234] lost connection to pod 1.6601150482063682e+09 ERROR Reconciler error {"controller": "emqxbroker", "controllerGroup": "apps.emqx.io", "controllerKind": "EmqxBroker", "emqxBroker": {"name":"emqx","namespace":"emqx"}, "namespace": "emqx", "name": "emqx", "reconcileID": "bbf92a96-d65e-4792-a130-bc0d0f594557", "error": "Operation cannot be fulfilled on emqxbrokers.apps.emqx.io \"emqx\": the object has been modified; please apply your changes to the latest version and try again"} sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.2/pkg/internal/controller/controller.go:273 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2 /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.2/pkg/internal/controller/controller.go:234 E0810 07:04:08.443142 1 portforward.go:406] an error occurred forwarding 35395 -> 8081: error forwarding port 8081 to pod 0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435, uid : failed to execute portforward in network namespace "/var/run/netns/cni-29540245-853e-246f-ed7f-0443a91d3642": failed to connect to localhost:8081 inside namespace "0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435", IPv4: dial tcp4 127.0.0.1:8081: connect: connection refused IPv6 dial tcp6 [::1]:8081: connect: connection refused E0810 07:04:08.907719 1 portforward.go:406] an error occurred forwarding 33055 -> 8081: error forwarding port 8081 to pod 0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435, uid : failed to execute portforward in network namespace "/var/run/netns/cni-29540245-853e-246f-ed7f-0443a91d3642": failed to connect to localhost:8081 inside namespace "0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435", IPv4: dial tcp4 127.0.0.1:8081: connect: connection refused IPv6 dial tcp6 [::1]:8081: connect: connection refused E0810 07:04:08.907842 1 portforward.go:234] lost connection to pod E0810 07:04:09.192174 1 portforward.go:406] an error occurred forwarding 34689 -> 8081: error forwarding port 8081 to pod 0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435, uid : failed to execute portforward in network namespace "/var/run/netns/cni-29540245-853e-246f-ed7f-0443a91d3642": failed to connect to localhost:8081 inside namespace "0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435", IPv4: dial tcp4 127.0.0.1:8081: connect: connection refused IPv6 dial tcp6 [::1]:8081: connect: connection refused E0810 07:04:09.192605 1 portforward.go:234] lost connection to pod E0810 07:04:09.616450 1 portforward.go:406] an error occurred forwarding 42301 -> 8081: error forwarding port 8081 to pod 0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435, uid : failed to execute portforward in network namespace "/var/run/netns/cni-29540245-853e-246f-ed7f-0443a91d3642": failed to connect to localhost:8081 inside namespace "0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435", IPv4: dial tcp4 127.0.0.1:8081: connect: connection refused IPv6 dial tcp6 [::1]:8081: connect: connection refused 1.6601150500930722e+09 ERROR Reconciler error {"controller": "emqxbroker", "controllerGroup": "apps.emqx.io", "controllerKind": "EmqxBroker", "emqxBroker": {"name":"emqx","namespace":"emqx"}, "namespace": "emqx", "name": "emqx", "reconcileID": "371cf80b-1e05-4126-ab31-639b04c5d478", "error": "Operation cannot be fulfilled on emqxbrokers.apps.emqx.io \"emqx\": the object has been modified; please apply your changes to the latest version and try again"} sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.2/pkg/internal/controller/controller.go:273 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2 /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.2/pkg/internal/controller/controller.go:234 1.6601150507833595e+09 ERROR Reconciler error {"controller": "emqxbroker", "controllerGroup": "apps.emqx.io", "controllerKind": "EmqxBroker", "emqxBroker": {"name":"emqx","namespace":"emqx"}, "namespace": "emqx", "name": "emqx", "reconcileID": "2c813143-c2f5-4398-b6b7-bf3e92bd4350", "error": "Operation cannot be fulfilled on emqxbrokers.apps.emqx.io \"emqx\": the object has been modified; please apply your changes to the latest version and try again"} sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.2/pkg/internal/controller/controller.go:273 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2 /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.2/pkg/internal/controller/controller.go:234

Logs of the first node: kubectl -n emqx logs emqx-0 -c emqx

hostname: emqx-0: Host not found
Starting emqx on node emqx@emqx-0.emqx-headless.emqx.svc.cluster.local
Start mqtt:tcp:internal listener on 127.0.0.1:11883 successfully.
Start mqtt:tcp:external listener on 0.0.0.0:1883 successfully.
Start mqtt:ws:external listener on 0.0.0.0:8083 successfully.
Start mqtt:ssl:external listener on 0.0.0.0:8883 successfully.
Start mqtt:wss:external listener on 0.0.0.0:8084 successfully.
Start http:management listener on 8081 successfully.
2022-08-10T07:04:09.807451+00:00 [warning] [Dashboard] Using default password for dashboard 'admin' user. Please use './bin/emqx_ctl admins' command to change it. NOTE: the default password in config file is only used to initialise the database record, changing the config file after database is initialised has no effect.
Start http:dashboard listener on 18083 successfully.
EMQ X Broker 4.4.6 is running now!
2022-08-10T07:04:11.816562+00:00 [warning] Ekka(AutoCluster): discovered nodes outside cluster: ['emqx@emqx-1.emqx-headless.emqx.svc.cluster.local']
2022-08-10T07:04:11.816736+00:00 [warning] Ekka(AutoCluster): discovery did not succeed; retrying in 5000 ms
2022-08-10T07:04:17.569069+00:00 [warning] Ekka(AutoCluster): discovered nodes outside cluster: ['emqx@emqx-1.emqx-headless.emqx.svc.cluster.local','emqx@emqx-2.emqx-headless.emqx.svc.cluster.local']
2022-08-10T07:04:17.569265+00:00 [warning] Ekka(AutoCluster): discovery did not succeed; retrying in 5000 ms
2022-08-10T07:04:25.334693+00:00 [warning] Ekka(AutoCluster): discovered nodes outside cluster: ['emqx@emqx-1.emqx-headless.emqx.svc.cluster.local','emqx@emqx-2.emqx-headless.emqx.svc.cluster.local']
2022-08-10T07:04:25.334864+00:00 [warning] Ekka(AutoCluster): discovery did not succeed; retrying in 5000 ms
2022-08-10T07:04:32.698077+00:00 [warning] Ekka(AutoCluster): discovered nodes outside cluster: ['emqx@emqx-1.emqx-headless.emqx.svc.cluster.local','emqx@emqx-2.emqx-headless.emqx.svc.cluster.local']
2022-08-10T07:04:32.698264+00:00 [warning] Ekka(AutoCluster): discovery did not succeed; retrying in 5000 ms
2022-08-10T07:04:38.495689+00:00 [warning] Ekka(AutoCluster): discovered nodes outside cluster: ['emqx@emqx-1.emqx-headless.emqx.svc.cluster.local','emqx@emqx-2.emqx-headless.emqx.svc.cluster.local']
2022-08-10T07:04:38.495870+00:00 [warning] Ekka(AutoCluster): discovery did not succeed; retrying in 5000 ms

The logs just stay the same after that.

Rory-Z commented 2 years ago

Is this the first deployment? Have you deployed emqx before and deleted it?

axkng commented 2 years ago

This is the first deployment of that broker. But yes, I tried to deploy other ones before.

Rory-Z commented 2 years ago

Could you please show logs for emqx-1 and emqx-2 ?

axkng commented 2 years ago

Sure thing.

kubectl -n exo-emqx logs emqx-1 -c emqx

hostname: emqx-1: Host not found
Starting emqx on node emqx@emqx-1.emqx-headless.emqx.svc.cluster.local
Start mqtt:tcp:internal listener on 127.0.0.1:11883 successfully.
Start mqtt:tcp:external listener on 0.0.0.0:1883 successfully.
Start mqtt:ws:external listener on 0.0.0.0:8083 successfully.
Start mqtt:ssl:external listener on 0.0.0.0:8883 successfully.
Start mqtt:wss:external listener on 0.0.0.0:8084 successfully.
Start http:management listener on 8081 successfully.
2022-08-10T07:04:11.429818+00:00 [warning] [Dashboard] Using default password for dashboard 'admin' user. Please use './bin/emqx_ctl admins' command to change it. NOTE: the default password in config file is only used to initialise the database record, changing the config file after database is initialised has no effect.
Start http:dashboard listener on 18083 successfully.
EMQ X Broker 4.4.6 is running now!
2022-08-10T07:04:12.467104+00:00 [warning] Ekka(AutoCluster): discovered nodes outside cluster: ['emqx@emqx-0.emqx-headless.emqx.svc.cluster.local']
2022-08-10T07:04:12.467272+00:00 [warning] Ekka(AutoCluster): discovery did not succeed; retrying in 5000 ms
2022-08-10T07:04:17.639297+00:00 [warning] Ekka(AutoCluster): discovered nodes outside cluster: ['emqx@emqx-0.emqx-headless.emqx.svc.cluster.local','emqx@emqx-2.emqx-headless.emqx.svc.cluster.local']
2022-08-10T07:04:17.639472+00:00 [warning] Ekka(AutoCluster): discovery did not succeed; retrying in 5000 ms
2022-08-10T07:04:24.940386+00:00 [warning] Ekka(AutoCluster): discovered nodes outside cluster: ['emqx@emqx-0.emqx-headless.emqx.svc.cluster.local','emqx@emqx-2.emqx-headless.emqx.svc.cluster.local']
2022-08-10T07:04:24.940561+00:00 [warning] Ekka(AutoCluster): discovery did not succeed; retrying in 5000 ms
2022-08-10T07:04:30.877727+00:00 [warning] Ekka(AutoCluster): discovered nodes outside cluster: ['emqx@emqx-0.emqx-headless.emqx.svc.cluster.local','emqx@emqx-2.emqx-headless.emqx.svc.cluster.local']
2022-08-10T07:04:30.877912+00:00 [warning] Ekka(AutoCluster): discovery did not succeed; retrying in 5000 ms
2022-08-10T07:04:38.386440+00:00 [warning] Ekka(AutoCluster): discovered nodes outside cluster: ['emqx@emqx-0.emqx-headless.emqx.svc.cluster.local','emqx@emqx-2.emqx-headless.emqx.svc.cluster.local']

kubectl -n emqx logs emqx-2 -c emqx

hostname: emqx-2: Host not found
Starting emqx on node emqx@emqx-2.emqx-headless.emqx.svc.cluster.local
Start mqtt:tcp:internal listener on 127.0.0.1:11883 successfully.
Start mqtt:tcp:external listener on 0.0.0.0:1883 successfully.
Start mqtt:ws:external listener on 0.0.0.0:8083 successfully.
Start mqtt:ssl:external listener on 0.0.0.0:8883 successfully.
Start mqtt:wss:external listener on 0.0.0.0:8084 successfully.
Start http:management listener on 8081 successfully.
2022-08-10T07:04:21.079133+00:00 [warning] [Dashboard] Using default password for dashboard 'admin' user. Please use './bin/emqx_ctl admins' command to change it. NOTE: the default password in config file is only used to initialise the database record, changing the config file after database is initialised has no effect.
Start http:dashboard listener on 18083 successfully.
EMQ X Broker 4.4.6 is running now!
2022-08-10T07:04:24.909316+00:00 [warning] Ekka(AutoCluster): discovered nodes outside cluster: ['emqx@emqx-0.emqx-headless.emqx.svc.cluster.local','emqx@emqx-1.emqx-headless.emqx.svc.cluster.local']
2022-08-10T07:04:24.909510+00:00 [warning] Ekka(AutoCluster): discovery did not succeed; retrying in 5000 ms
2022-08-10T07:04:32.263225+00:00 [warning] Ekka(AutoCluster): discovered nodes outside cluster: ['emqx@emqx-0.emqx-headless.emqx.svc.cluster.local','emqx@emqx-1.emqx-headless.emqx.svc.cluster.local']
2022-08-10T07:04:32.263384+00:00 [warning] Ekka(AutoCluster): discovery did not succeed; retrying in 5000 ms
2022-08-10T07:04:37.785043+00:00 [warning] Ekka(AutoCluster): discovered nodes outside cluster: ['emqx@emqx-0.emqx-headless.emqx.svc.cluster.local','emqx@emqx-1.emqx-headless.emqx.svc.cluster.local']
2022-08-10T07:04:37.785206+00:00 [warning] Ekka(AutoCluster): discovery did not succeed; retrying in 5000 ms
2022-08-10T07:04:43.694825+00:00 [warning] Ekka(AutoCluster): discovered nodes outside cluster: ['emqx@emqx-0.emqx-headless.emqx.svc.cluster.local','emqx@emqx-1.emqx-headless.emqx.svc.cluster.local']

Again, the logs just stay the same.

Rory-Z commented 2 years ago

@qzhuyan Have any idea ?

qzhuyan commented 2 years ago

After talked to @Rory-Z we think it relates to publishNotReadyAddresses flag in the k8s service.

@Rory-Z will release a fix for it.

@Furragen you could try to manually set publishNotReadyAddresses to true and delete all the pods to verify it or wait for the new release of emqx operator.

axkng commented 2 years ago

Hi @qzhuyan , I tested this, but sadly the error stays the same.

Rory-Z commented 2 years ago

Hi @Furragen EMQX Operator 1.2.5 is released, please try again, and please let me know is it work

axkng commented 2 years ago

Hi @Rory-Z , thank your for the new release, but the error sadly was not fixed.

Rory-Z commented 2 years ago

@Furragen Sounds frustrating, the EMQX pod log still the same ?

Rory-Z commented 2 years ago

Hi, @Furragen Could you please check pod network ? running following command in EMQX pod

nslookup -type=srv $(headless service name).$(namespace).svc.cluster.local

you should got output like this

emqx-headless.default.svc.cluster.local service = 0 33 8081 emqx-0.emqx-headless.default.svc.cluster.local
emqx-headless.default.svc.cluster.local service = 0 33 8081 emqx-1.emqx-headless.default.svc.cluster.local
emqx-headless.default.svc.cluster.local service = 0 33 8081 emqx-2.emqx-headless.default.svc.cluster.local

and check network ping

nc -zv emqx-2.emqx-headless.default.svc.cluster.local 8081

and like this output is successfully

emqx-2.emqx-headless.default.svc.cluster.local (172.17.0.8:8081) open

axkng commented 2 years ago

So the lookup worked fine. My cluster uses IPv6 btw. Could that be a problem?

Network ping did not work.

Rory-Z commented 2 years ago

Network ping did not work.

I think that is reason.

Could you please check if pinging another EMQX pod with IP in the EMQX pod works?

Rory-Z commented 2 years ago

In statefulSet, pod should have stable network ID: https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#stable-network-id, EMQX use this network ID discover each other, if network don't work, EMQX cluster will failed.

Because this is the k8s feature, so maybe need check AWS EKS

axkng commented 2 years ago

The direct way via the IP of the pod also did not work. And I think I know why: EMQX only listens on IPv4.

netstat -tulpen
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
tcp        0      0 127.0.0.1:11883         0.0.0.0:*               LISTEN      1/emqx
tcp        0      0 0.0.0.0:8081            0.0.0.0:*               LISTEN      1/emqx
tcp        0      0 0.0.0.0:4370            0.0.0.0:*               LISTEN      1/emqx
tcp        0      0 0.0.0.0:8883            0.0.0.0:*               LISTEN      1/emqx
tcp        0      0 0.0.0.0:8083            0.0.0.0:*               LISTEN      1/emqx
tcp        0      0 0.0.0.0:8084            0.0.0.0:*               LISTEN      1/emqx
tcp        0      0 0.0.0.0:5369            0.0.0.0:*               LISTEN      1/emqx
tcp        0      0 0.0.0.0:1883            0.0.0.0:*               LISTEN      1/emqx
tcp        0      0 0.0.0.0:18083           0.0.0.0:*               LISTEN      1/emqx

This was from inside the emqx-0 pod. Like I said, the cluster uses IPv6, so this can not work. Is there any way to make EMQX listen to IPv6?

Rory-Z commented 2 years ago

@Furragen You can deploy EMQX like this:

apiVersion: apps.emqx.io/v1beta3
kind: EmqxBroker
metadata:
  name: emqx
spec:
  emqxTemplate:
    image: emqx/emqx:4.4.6
    config:
      listener.tcp.external: :::1883
      management.listener.http: :::8081
      dashboard.listener.http: :::18083

Sorry I don't have IPV6 cluster, so need your try this

axkng commented 2 years ago

Absolutely no problem. I redeployed the broker and we got a little further. The logs and the error stays the same:

emqx_ctl cluster_status
Node 'emqx@emqx-0.emqx-headless.exo-emqx.svc.cluster.local' not responding to pings.
/opt/emqx/bin/emqx: line 46: die: command not found

But: doing the ping by hand with ncnow succeeds. So the connection works, but something is still broken. Could there be more listeners that I need to switch to v6?

Rory-Z commented 2 years ago

Cooool, You can change all the listener you care about to IPV6 format, see https://www.emqx.io/docs/en/v4.4/configuration/configuration.html#listener-tcp-external

Could you please run following command in EMQX pod:

emqx eval "net_adm:ping('emqx@emqx-0.emqx-headless.default.svc.cluster.local')."

The emqx@emqx-0.emqx-headless.default.svc.cluster.local is other EMQX node name

axkng commented 2 years ago

So, I tried this and the command you mentioned did not succeed. The error is:

Node 'emqx@emqx-0.emqx-headless.emqx.svc.cluster.local' not responding to pings.
/usr/local/bin/emqx: line 46: die: command not found

This error always appears when running the emqx-command.

Also, I have tested around with setting listeners to IPv6:

apiVersion: apps.emqx.io/v1beta3
kind: EmqxBroker
metadata:
  name: emqx
  labels:
    app: emqx
    environment: dev
spec:
  persistent:
    accessModes:
      - ReadWriteOnce
    storageClassName: ebs-gp3
    resources:
      requests:
        storage: 1Gi
  emqxTemplate:
    image: emqx/emqx:4.4.6
    config:
      listener.tcp.external: :::1883
      listener.ssl.external: :::8883
      management.listener.http: :::8081
      dashboard.listener.http: :::18083
      listener.tcp.internal: :::11883
      listener.ws.external: :::8083
      listener.wss.external: :::8084

The pods start, but the dashboard-plugin seems to be unhappy:

2022-08-11T09:45:39.371399+00:00 [alert] [Plugins] Plugin emqx_dashboard load failed with {function_clause,[{emqx_plugins,apply_configs,[{error,transform_datatypes,{errorlist,[{error,{transform_type,"dashboard.listener.http"}},{error,{conversion,{":::18083",integer}}}]}}],[{file,"emqx_plugins.erl"},{line,302}]},{emqx_plugins,load_plugin,2,[{file,"emqx_plugins.erl"},{line,325}]},{lists,foreach,2,[{file,"lists.erl"},{line,1342}]},{emqx_app,start,2,[{file,"emqx_app.erl"},{line,50}]},{application_master,start_it_old,4,[{file,"application_master.erl"},{line,293}]}]}

Looks like it cannot convert the v6-notation.

On top of that I found three other settings that would need tuning I think. The first one is cluster.proto_dist. The docs mention that I could set it to inet6_tcp to use IPv6. But when I do that, the pods do not start anymore.

And then there are cluster.mcast.iface and rpc.tcp_server_ip. These two settings do not seem to support IPv6 according to the docs. Is that correct?

The listeners I just mentioned and the ones in my manifest seem to be the ones EMQX starts by default, so I did not look further.

Do you know of anyone using EMQX with IPv6?

Rory-Z commented 2 years ago

@qzhuyan @zmstone Need help

qzhuyan commented 2 years ago

Node 'emqx@emqx-0.emqx-headless.emqx.svc.cluster.local' not responding to pings.
/usr/local/bin/emqx: line 46: die: command not found

means the peer node that we are pinging is unreachable.

axkng commented 2 years ago

I ran the command from the emqx-0 pod, trying to query emqx-1. Does that not mean emqx-0 has a problem?

zmstone commented 2 years ago

It's likely that EMQX's distribution and RPC library does not support ipv6 that well. We'll investigate it.

axkng commented 2 years ago

Good to know, thank you.

zmstone commented 4 months ago

Sorry for the late update. Since there is a lack of issue or PR link to this one, and it's quite some time ago, I cannot be very sure, but I seems the ipv6 issues are already resolved. Here is a fix for the PRC lib wrt ipv6: https://github.com/emqx/gen_rpc/pull/38 and https://github.com/emqx/emqx/pull/11734

emqx / emqx-operator

EMQX-Cluster not working in IPV6 only network #327