Closed axkng closed 4 months ago
Hi, @Furragen
Could you please show emqx-operator logs and emqx custom resource status? run the following command
kubectl get EmqxBroker emqx -o json | jq '.status'
kubectl logs -f -l "control-plane=controller-manager" -n emqx-operator-system -c manager --tail=100
And the emqx pod logs
kubectl logs emqx-0 -c emqx
Hi @Rory-Z , thanks for your quick response.
kubectl get -n emqx EmqxBroker emqx -o json | jq '.status'
{
"conditions": [
{
"lastTransitionTime": "2022-08-10T07:04:09Z",
"lastUpdateTime": "2022-08-10T07:26:23Z",
"message": "Some nodes are not ready",
"reason": "ClusterNotReady",
"status": "False",
"type": "Running"
},
{
"lastTransitionTime": "2022-08-10T07:03:26Z",
"lastUpdateTime": "2022-08-10T07:03:26Z",
"message": "All default plugins initialized",
"reason": "PluginInitializeSuccessfully",
"status": "True",
"type": "PluginInitialized"
}
],
"emqxNodes": [
{
"node": "emqx@emqx-0.emqx-headless.emqx.svc.cluster.local",
"node_status": "Running",
"otp_release": "24.1.5/12.1.5",
"version": "4.4.6"
}
],
"readyReplicas": 1,
"replicas": 3
}
Logs of the operator ( kubectl logs -f -l "control-plane=controller-manager" -n emqx -c manager --tail=100
)
Logs of the first node:
kubectl -n emqx logs emqx-0 -c emqx
hostname: emqx-0: Host not found
Starting emqx on node emqx@emqx-0.emqx-headless.emqx.svc.cluster.local
Start mqtt:tcp:internal listener on 127.0.0.1:11883 successfully.
Start mqtt:tcp:external listener on 0.0.0.0:1883 successfully.
Start mqtt:ws:external listener on 0.0.0.0:8083 successfully.
Start mqtt:ssl:external listener on 0.0.0.0:8883 successfully.
Start mqtt:wss:external listener on 0.0.0.0:8084 successfully.
Start http:management listener on 8081 successfully.
2022-08-10T07:04:09.807451+00:00 [warning] [Dashboard] Using default password for dashboard 'admin' user. Please use './bin/emqx_ctl admins' command to change it. NOTE: the default password in config file is only used to initialise the database record, changing the config file after database is initialised has no effect.
Start http:dashboard listener on 18083 successfully.
EMQ X Broker 4.4.6 is running now!
2022-08-10T07:04:11.816562+00:00 [warning] Ekka(AutoCluster): discovered nodes outside cluster: ['emqx@emqx-1.emqx-headless.emqx.svc.cluster.local']
2022-08-10T07:04:11.816736+00:00 [warning] Ekka(AutoCluster): discovery did not succeed; retrying in 5000 ms
2022-08-10T07:04:17.569069+00:00 [warning] Ekka(AutoCluster): discovered nodes outside cluster: ['emqx@emqx-1.emqx-headless.emqx.svc.cluster.local','emqx@emqx-2.emqx-headless.emqx.svc.cluster.local']
2022-08-10T07:04:17.569265+00:00 [warning] Ekka(AutoCluster): discovery did not succeed; retrying in 5000 ms
2022-08-10T07:04:25.334693+00:00 [warning] Ekka(AutoCluster): discovered nodes outside cluster: ['emqx@emqx-1.emqx-headless.emqx.svc.cluster.local','emqx@emqx-2.emqx-headless.emqx.svc.cluster.local']
2022-08-10T07:04:25.334864+00:00 [warning] Ekka(AutoCluster): discovery did not succeed; retrying in 5000 ms
2022-08-10T07:04:32.698077+00:00 [warning] Ekka(AutoCluster): discovered nodes outside cluster: ['emqx@emqx-1.emqx-headless.emqx.svc.cluster.local','emqx@emqx-2.emqx-headless.emqx.svc.cluster.local']
2022-08-10T07:04:32.698264+00:00 [warning] Ekka(AutoCluster): discovery did not succeed; retrying in 5000 ms
2022-08-10T07:04:38.495689+00:00 [warning] Ekka(AutoCluster): discovered nodes outside cluster: ['emqx@emqx-1.emqx-headless.emqx.svc.cluster.local','emqx@emqx-2.emqx-headless.emqx.svc.cluster.local']
2022-08-10T07:04:38.495870+00:00 [warning] Ekka(AutoCluster): discovery did not succeed; retrying in 5000 ms
The logs just stay the same after that.
Is this the first deployment? Have you deployed emqx before and deleted it?
This is the first deployment of that broker. But yes, I tried to deploy other ones before.
Could you please show logs for emqx-1
and emqx-2
?
Sure thing.
kubectl -n exo-emqx logs emqx-1 -c emqx
hostname: emqx-1: Host not found
Starting emqx on node emqx@emqx-1.emqx-headless.emqx.svc.cluster.local
Start mqtt:tcp:internal listener on 127.0.0.1:11883 successfully.
Start mqtt:tcp:external listener on 0.0.0.0:1883 successfully.
Start mqtt:ws:external listener on 0.0.0.0:8083 successfully.
Start mqtt:ssl:external listener on 0.0.0.0:8883 successfully.
Start mqtt:wss:external listener on 0.0.0.0:8084 successfully.
Start http:management listener on 8081 successfully.
2022-08-10T07:04:11.429818+00:00 [warning] [Dashboard] Using default password for dashboard 'admin' user. Please use './bin/emqx_ctl admins' command to change it. NOTE: the default password in config file is only used to initialise the database record, changing the config file after database is initialised has no effect.
Start http:dashboard listener on 18083 successfully.
EMQ X Broker 4.4.6 is running now!
2022-08-10T07:04:12.467104+00:00 [warning] Ekka(AutoCluster): discovered nodes outside cluster: ['emqx@emqx-0.emqx-headless.emqx.svc.cluster.local']
2022-08-10T07:04:12.467272+00:00 [warning] Ekka(AutoCluster): discovery did not succeed; retrying in 5000 ms
2022-08-10T07:04:17.639297+00:00 [warning] Ekka(AutoCluster): discovered nodes outside cluster: ['emqx@emqx-0.emqx-headless.emqx.svc.cluster.local','emqx@emqx-2.emqx-headless.emqx.svc.cluster.local']
2022-08-10T07:04:17.639472+00:00 [warning] Ekka(AutoCluster): discovery did not succeed; retrying in 5000 ms
2022-08-10T07:04:24.940386+00:00 [warning] Ekka(AutoCluster): discovered nodes outside cluster: ['emqx@emqx-0.emqx-headless.emqx.svc.cluster.local','emqx@emqx-2.emqx-headless.emqx.svc.cluster.local']
2022-08-10T07:04:24.940561+00:00 [warning] Ekka(AutoCluster): discovery did not succeed; retrying in 5000 ms
2022-08-10T07:04:30.877727+00:00 [warning] Ekka(AutoCluster): discovered nodes outside cluster: ['emqx@emqx-0.emqx-headless.emqx.svc.cluster.local','emqx@emqx-2.emqx-headless.emqx.svc.cluster.local']
2022-08-10T07:04:30.877912+00:00 [warning] Ekka(AutoCluster): discovery did not succeed; retrying in 5000 ms
2022-08-10T07:04:38.386440+00:00 [warning] Ekka(AutoCluster): discovered nodes outside cluster: ['emqx@emqx-0.emqx-headless.emqx.svc.cluster.local','emqx@emqx-2.emqx-headless.emqx.svc.cluster.local']
kubectl -n emqx logs emqx-2 -c emqx
hostname: emqx-2: Host not found
Starting emqx on node emqx@emqx-2.emqx-headless.emqx.svc.cluster.local
Start mqtt:tcp:internal listener on 127.0.0.1:11883 successfully.
Start mqtt:tcp:external listener on 0.0.0.0:1883 successfully.
Start mqtt:ws:external listener on 0.0.0.0:8083 successfully.
Start mqtt:ssl:external listener on 0.0.0.0:8883 successfully.
Start mqtt:wss:external listener on 0.0.0.0:8084 successfully.
Start http:management listener on 8081 successfully.
2022-08-10T07:04:21.079133+00:00 [warning] [Dashboard] Using default password for dashboard 'admin' user. Please use './bin/emqx_ctl admins' command to change it. NOTE: the default password in config file is only used to initialise the database record, changing the config file after database is initialised has no effect.
Start http:dashboard listener on 18083 successfully.
EMQ X Broker 4.4.6 is running now!
2022-08-10T07:04:24.909316+00:00 [warning] Ekka(AutoCluster): discovered nodes outside cluster: ['emqx@emqx-0.emqx-headless.emqx.svc.cluster.local','emqx@emqx-1.emqx-headless.emqx.svc.cluster.local']
2022-08-10T07:04:24.909510+00:00 [warning] Ekka(AutoCluster): discovery did not succeed; retrying in 5000 ms
2022-08-10T07:04:32.263225+00:00 [warning] Ekka(AutoCluster): discovered nodes outside cluster: ['emqx@emqx-0.emqx-headless.emqx.svc.cluster.local','emqx@emqx-1.emqx-headless.emqx.svc.cluster.local']
2022-08-10T07:04:32.263384+00:00 [warning] Ekka(AutoCluster): discovery did not succeed; retrying in 5000 ms
2022-08-10T07:04:37.785043+00:00 [warning] Ekka(AutoCluster): discovered nodes outside cluster: ['emqx@emqx-0.emqx-headless.emqx.svc.cluster.local','emqx@emqx-1.emqx-headless.emqx.svc.cluster.local']
2022-08-10T07:04:37.785206+00:00 [warning] Ekka(AutoCluster): discovery did not succeed; retrying in 5000 ms
2022-08-10T07:04:43.694825+00:00 [warning] Ekka(AutoCluster): discovered nodes outside cluster: ['emqx@emqx-0.emqx-headless.emqx.svc.cluster.local','emqx@emqx-1.emqx-headless.emqx.svc.cluster.local']
Again, the logs just stay the same.
@qzhuyan Have any idea ?
After talked to @Rory-Z we think it relates to publishNotReadyAddresses
flag in the k8s service.
@Rory-Z will release a fix for it.
@Furragen you could try to manually set publishNotReadyAddresses
to true and delete all the pods to verify it or wait for the new release of emqx operator.
Hi @qzhuyan , I tested this, but sadly the error stays the same.
Hi @Furragen EMQX Operator 1.2.5 is released, please try again, and please let me know is it work
Hi @Rory-Z , thank your for the new release, but the error sadly was not fixed.
@Furragen Sounds frustrating, the EMQX pod log still the same ?
Hi, @Furragen Could you please check pod network ? running following command in EMQX pod
nslookup -type=srv $(headless service name).$(namespace).svc.cluster.local
you should got output like this
emqx-headless.default.svc.cluster.local service = 0 33 8081 emqx-0.emqx-headless.default.svc.cluster.local
emqx-headless.default.svc.cluster.local service = 0 33 8081 emqx-1.emqx-headless.default.svc.cluster.local
emqx-headless.default.svc.cluster.local service = 0 33 8081 emqx-2.emqx-headless.default.svc.cluster.local
and check network ping
nc -zv emqx-2.emqx-headless.default.svc.cluster.local 8081
and like this output is successfully
emqx-2.emqx-headless.default.svc.cluster.local (172.17.0.8:8081) open
So the lookup worked fine. My cluster uses IPv6 btw. Could that be a problem?
Network ping did not work.
Network ping did not work.
I think that is reason.
Could you please check if pinging another EMQX pod with IP in the EMQX pod works?
In statefulSet, pod should have stable network ID: https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#stable-network-id, EMQX use this network ID discover each other, if network don't work, EMQX cluster will failed.
Because this is the k8s feature, so maybe need check AWS EKS
The direct way via the IP of the pod also did not work. And I think I know why: EMQX only listens on IPv4.
netstat -tulpen
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 127.0.0.1:11883 0.0.0.0:* LISTEN 1/emqx
tcp 0 0 0.0.0.0:8081 0.0.0.0:* LISTEN 1/emqx
tcp 0 0 0.0.0.0:4370 0.0.0.0:* LISTEN 1/emqx
tcp 0 0 0.0.0.0:8883 0.0.0.0:* LISTEN 1/emqx
tcp 0 0 0.0.0.0:8083 0.0.0.0:* LISTEN 1/emqx
tcp 0 0 0.0.0.0:8084 0.0.0.0:* LISTEN 1/emqx
tcp 0 0 0.0.0.0:5369 0.0.0.0:* LISTEN 1/emqx
tcp 0 0 0.0.0.0:1883 0.0.0.0:* LISTEN 1/emqx
tcp 0 0 0.0.0.0:18083 0.0.0.0:* LISTEN 1/emqx
This was from inside the emqx-0 pod. Like I said, the cluster uses IPv6, so this can not work. Is there any way to make EMQX listen to IPv6?
@Furragen You can deploy EMQX like this:
apiVersion: apps.emqx.io/v1beta3
kind: EmqxBroker
metadata:
name: emqx
spec:
emqxTemplate:
image: emqx/emqx:4.4.6
config:
listener.tcp.external: :::1883
management.listener.http: :::8081
dashboard.listener.http: :::18083
Sorry I don't have IPV6 cluster, so need your try this
Absolutely no problem. I redeployed the broker and we got a little further. The logs and the error stays the same:
emqx_ctl cluster_status
Node 'emqx@emqx-0.emqx-headless.exo-emqx.svc.cluster.local' not responding to pings.
/opt/emqx/bin/emqx: line 46: die: command not found
But: doing the ping by hand with nc
now succeeds.
So the connection works, but something is still broken.
Could there be more listeners that I need to switch to v6?
Cooool, You can change all the listener you care about to IPV6 format, see https://www.emqx.io/docs/en/v4.4/configuration/configuration.html#listener-tcp-external
Could you please run following command in EMQX pod:
emqx eval "net_adm:ping('emqx@emqx-0.emqx-headless.default.svc.cluster.local')."
The emqx@emqx-0.emqx-headless.default.svc.cluster.local
is other EMQX node name
So, I tried this and the command you mentioned did not succeed. The error is:
Node 'emqx@emqx-0.emqx-headless.emqx.svc.cluster.local' not responding to pings.
/usr/local/bin/emqx: line 46: die: command not found
This error always appears when running the emqx
-command.
Also, I have tested around with setting listeners to IPv6:
apiVersion: apps.emqx.io/v1beta3
kind: EmqxBroker
metadata:
name: emqx
labels:
app: emqx
environment: dev
spec:
persistent:
accessModes:
- ReadWriteOnce
storageClassName: ebs-gp3
resources:
requests:
storage: 1Gi
emqxTemplate:
image: emqx/emqx:4.4.6
config:
listener.tcp.external: :::1883
listener.ssl.external: :::8883
management.listener.http: :::8081
dashboard.listener.http: :::18083
listener.tcp.internal: :::11883
listener.ws.external: :::8083
listener.wss.external: :::8084
The pods start, but the dashboard-plugin seems to be unhappy:
2022-08-11T09:45:39.371399+00:00 [alert] [Plugins] Plugin emqx_dashboard load failed with {function_clause,[{emqx_plugins,apply_configs,[{error,transform_datatypes,{errorlist,[{error,{transform_type,"dashboard.listener.http"}},{error,{conversion,{":::18083",integer}}}]}}],[{file,"emqx_plugins.erl"},{line,302}]},{emqx_plugins,load_plugin,2,[{file,"emqx_plugins.erl"},{line,325}]},{lists,foreach,2,[{file,"lists.erl"},{line,1342}]},{emqx_app,start,2,[{file,"emqx_app.erl"},{line,50}]},{application_master,start_it_old,4,[{file,"application_master.erl"},{line,293}]}]}
Looks like it cannot convert the v6-notation.
On top of that I found three other settings that would need tuning I think.
The first one is cluster.proto_dist.
The docs mention that I could set it to inet6_tcp
to use IPv6. But when I do that, the pods do not start anymore.
And then there are cluster.mcast.iface and rpc.tcp_server_ip. These two settings do not seem to support IPv6 according to the docs. Is that correct?
The listeners I just mentioned and the ones in my manifest seem to be the ones EMQX starts by default, so I did not look further.
Do you know of anyone using EMQX with IPv6?
@qzhuyan @zmstone Need help
Node 'emqx@emqx-0.emqx-headless.emqx.svc.cluster.local' not responding to pings.
/usr/local/bin/emqx: line 46: die: command not found
means the peer node that we are pinging is unreachable.
I ran the command from the emqx-0 pod, trying to query emqx-1. Does that not mean emqx-0 has a problem?
It's likely that EMQX's distribution and RPC library does not support ipv6 that well. We'll investigate it.
Good to know, thank you.
Sorry for the late update. Since there is a lack of issue or PR link to this one, and it's quite some time ago, I cannot be very sure, but I seems the ipv6 issues are already resolved. Here is a fix for the PRC lib wrt ipv6: https://github.com/emqx/gen_rpc/pull/38 and https://github.com/emqx/emqx/pull/11734
Describe the bug After following the getting-started page to setup the emqx-operator I provisioned a emqx-cluster. The pods start and are running, but the status commands return errors:
To Reproduce Steps to reproduce the behavior:
Expected behavior Not to get errors on the status commands after provisioning a simple broker with no config.
Anything else we need to know?:
Environment details::
Did I do something wrong here?