emqx / emqx-rel

Release Project for EMQX Broker prior to 4.3. Newer releases are built here: https://github.com/emqx/emqx
https://www.emqx.com
Apache License 2.0
190 stars 221 forks source link

emqx k8s Cluster fail to add pods #596

Open and-dzh3 opened 4 years ago

and-dzh3 commented 4 years ago

Environment

Description

After few Kubernetes nodes have gone and newly added, (due to running on spots instances) emqx pods go to restart loops . Many of these logs
(emqx@emqx-0.emqx-headless.emqx.svc.cluster.local)1> 2020-09-28 16:04:01.021 [error] Mnesia('emqx@emqx-0.emqx-headless.emqx.svc.cluster.local'): ** ERROR ** Mnesia on 'emqx@emqx-0.emqx-headless.emqx.svc.cluster.local' could not connect to node(s) ['emqx@emqx-1.emqx-headless.emqx.svc.cluster.local']

Only after recreating manually, 1 or 2 affected pods issue resolved

terry-xiaoyu commented 4 years ago

@and-dzh3 It shows that it cannot connect to emqx@emqx-1.emqx-headless.emqx.svc.cluster.local, may because the k8s network is not ready, or the node names have been changed.

and-dzh3 commented 4 years ago

@terry-xiaoyu network ready and everything looks good about Kubernetes. I added to config EMQX_CLUSTER__AUTOHEAL: "on" looks better, but ... why auto-heal is not on by default? and why the error messages generated are not informative?