Closed anzerozman closed 10 months ago
Hi @anzerozman I'm sorry, I don't understand. you side "if you e.g. manually delete emqx pod or if you descale/scale Statefulset. Pod is ready up and running, but stateful set has pending status for that pod forever.", and you also side "The only way to fix that issue is destroying/recreating emqx pods." This sounds contradictory.
I'm deploy EMQX CR in my minikube cluster, and I'm manually delete some one pod, and wait it recreate, looks good.
PS: I don't recommend scale statefulset directly. It will be tuned back by emqx operator. You should modify EMQX CR.
Yes, after deletion emqx pod is successfully recreated in both cases - in versions 2.1.2 and also in version e.g. 2.2.5). Everything looks fine from pods perspective. I have example with 2 replicas. In version 2.1.2 both pods are bind with endpoints to load balancer correctly if I deleted one of them. But if you try reproduce it in never version of operator (v2beta1 - 2.2.x) after deleting one pod, pod is successfuly recreated but statefulsets has only 1/2 pods ready. And if you look at listener (LoadBalancer) service, only endpoint from ready pod is associated (recreated pod's endpoint is not exposed any more).
in EMQX pods, I set the ReadinessGates, and EMQX operator controller will check if this pods is already in EMQX cluster, if they already joined, they will be ready. So if pods can not ready, I think there are two possibilities:
emqx ctl cluster status
in this pod to check it.I run it from one of pods: emqx@test-emqx-core-57bcb74d8d-1:/opt/emqx$ emqx ctl cluster status Cluster status: #{running_nodes => ['emqx@test-emqx-core-57bcb74d8d-0.test-emqx-headless.default.svc.cluster.local', 'emqx@test-emqx-core-57bcb74d8d-1.test-emqx-headless.default.svc.cluster.local'], stopped_nodes => []}
but "kubectl get statefulsets" says: NAME READY AGE test-emqx-core-57bcb74d8d 1/2 98m
And there is only one endpoint visible from:
kubectl describe service test-emqx-listeners
Name: test-emqx-listeners
Namespace: default
Labels: apps.emqx.io/instance=test-emqx
apps.emqx.io/managed-by=emqx-operator
Annotations: apps.emqx.io/last-applied:
UEsDBBQACAAIAAAAAAAAAAAAAAAAAAAAAAAIAAAAb3JpZ2luYWykVFtv8zYM/S98tjzLyb6ketsVGLBLtmXF0GYPskwHQmTJleSubeD/PtCXxMnSYfj6JpM0eXh4yCPIRt+jD9pZEP...
service.beta.kubernetes.io/azure-load-balancer-resource-group: anze-test
Selector: apps.emqx.io/db-role=core,apps.emqx.io/instance=test-emqx,apps.emqx.io/managed-by=emqx-operator
Type: LoadBalancer
IP Family Policy: SingleStack
IP Families: IPv4
IP: 10.108.147.230
IPs: 10.108.147.230
IP: 127.0.0.1
LoadBalancer Ingress: 127.0.0.1
Port: tcp-mqtt 1883/TCP
TargetPort: 1883/TCP
NodePort: tcp-mqtt 32472/TCP
Endpoints: 10.244.0.79:1883
Session Affinity: None
External Traffic Policy: Cluster
Events:
Br, Anze.
Is the EMQX operator is running ? and could you please show those two EMQX pod status
kubectl get pods
NAME READY STATUS RESTARTS AGE
emqx5-emqx-operator-controller-manager-94b888f65-ppxrr 1/1 Running 0 158m
test-emqx-core-57bcb74d8d-0 1/1 Running 0 155m
test-emqx-core-57bcb74d8d-1 1/1 Running 0 153m
POD0:
kubectl describe pod test-emqx-core-57bcb74d8d-0
Name: test-emqx-core-57bcb74d8d-0
Namespace: default
Priority: 0
Node: minikube/192.168.49.2
Start Time: Tue, 09 Jan 2024 12:42:12 +0100
Labels: apps.emqx.io/db-role=core
apps.emqx.io/instance=test-emqx
apps.emqx.io/managed-by=emqx-operator
apps.emqx.io/pod-template-hash=57bcb74d8d
controller-revision-hash=test-emqx-core-57bcb74d8d-86f944c4f
statefulset.kubernetes.io/pod-name=test-emqx-core-57bcb74d8d-0
Annotations: <none>
Status: Running
IP: 10.244.0.79
IPs:
IP: 10.244.0.79
Controlled By: StatefulSet/test-emqx-core-57bcb74d8d
Containers:
emqx:
Container ID: docker://fb861131f9c84be7670a8613ee05f6490d0f6fd9279276eb1757172c51191324
Image: emqx/emqx:5.3.2
Image ID: docker-pullable://emqx/emqx@sha256:858305e7b0b33b28abbc29bb0063b193e9b127224ffc284a36584f910cf699d0
Port: 18083/TCP
Host Port: 0/TCP
State: Running
Started: Tue, 09 Jan 2024 12:42:13 +0100
Ready: True
Restart Count: 0
Liveness: http-get http://:dashboard/status delay=60s timeout=1s period=30s #success=1 #failure=3
Readiness: http-get http://:dashboard/status delay=10s timeout=1s period=5s #success=1 #failure=12
Environment:
EMQX_DASHBOARD__LISTENERS__HTTP__BIND: 18083
POD_NAME: test-emqx-core-57bcb74d8d-0 (v1:metadata.name)
EMQX_CLUSTER__DISCOVERY_STRATEGY: dns
EMQX_CLUSTER__DNS__RECORD_TYPE: srv
EMQX_CLUSTER__DNS__NAME: test-emqx-headless.default.svc.cluster.local
EMQX_HOST: $(POD_NAME).$(EMQX_CLUSTER__DNS__NAME)
EMQX_NODE__DATA_DIR: data
EMQX_NODE__ROLE: core
EMQX_NODE__COOKIE: <set to the key 'node_cookie' in secret 'test-emqx-node-cookie'> Optional: false
EMQX_API_KEY__BOOTSTRAP_FILE: "/opt/emqx/data/bootstrap_api_key"
EMQX_DASHBOARD__DEFAULT_USERNAME: test
EMQX_DASHBOARD__DEFAULT_PASSWORD: test
EMQX_LISTENERS__WS__DEFAULT__ENABLE: false
EMQX_LISTENERS__WSS__DEFAULT__ENABLE: false
EMQX_AUTHENTICATION__1__MECHANISM: password_based
EMQX_AUTHENTICATION__1__BACKEND: built_in_database
EMQX_AUTHENTICATION__1__PASSWORD_HASH_ALGORITHM__NAME: bcrypt
EMQX_AUTHENTICATION__1__PASSWORD_HASH_ALGORITHM__SALT_ROUNDS: 12
EMQX_AUTHENTICATION__2__MECHANISM: jwt
EMQX_AUTHENTICATION__2__USE_JWKS: false
EMQX_AUTHENTICATION__2__ALGORITHM: hmac-based
EMQX_AUTHENTICATION__2__SECRET: test
EMQX_TELEMETRY__ENABLE: false
EMQX_AUTHENTICATION__2__VERIFY_CLAIMS: {edge_node_id: "${username}"}
EMQX_CLUSTER__DISCOVERY_STRATEGY: dns
EMQX_CLUSTER__DNS__RECORD_TYPE: srv
EMQX_SYSMON__VM__LONG_SCHEDULE: disabled
EMQX_LISTENERS__TCP__DEFAULT__ENABLE: false
EMQX_LISTENERS__SSL__DEFAULT__ENABLE: false
EMQX_LISTENERS__TCP__MQTT__BIND: "0.0.0.0:1883"
Mounts:
/opt/emqx/data from test-emqx-core-data (rw)
/opt/emqx/data/bootstrap_api_key from bootstrap-api-key (ro,path="bootstrap_api_key")
/opt/emqx/etc/emqx.conf from bootstrap-config (ro,path="emqx.conf")
/opt/emqx/log from test-emqx-core-log (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-nbg86 (ro)
Readiness Gates:
Type Status
apps.emqx.io/on-serving True
Conditions:
Type Status
apps.emqx.io/on-serving True
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
test-emqx-core-data:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: test-emqx-core-data-test-emqx-core-57bcb74d8d-0
ReadOnly: false
bootstrap-api-key:
Type: Secret (a volume populated by a Secret)
SecretName: test-emqx-bootstrap-api-key
Optional: false
bootstrap-config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: test-emqx-configs
Optional: false
test-emqx-core-log:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
kube-api-access-nbg86:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events: <none>
POD1:
kubectl describe pod test-emqx-core-57bcb74d8d-1
Name: test-emqx-core-57bcb74d8d-1
Namespace: default
Priority: 0
Node: minikube/192.168.49.2
Start Time: Tue, 09 Jan 2024 12:44:01 +0100
Labels: apps.emqx.io/db-role=core
apps.emqx.io/instance=test-emqx
apps.emqx.io/managed-by=emqx-operator
apps.emqx.io/pod-template-hash=57bcb74d8d
controller-revision-hash=test-emqx-core-57bcb74d8d-86f944c4f
statefulset.kubernetes.io/pod-name=test-emqx-core-57bcb74d8d-1
Annotations: <none>
Status: Running
IP: 10.244.0.81
IPs:
IP: 10.244.0.81
Controlled By: StatefulSet/test-emqx-core-57bcb74d8d
Containers:
emqx:
Container ID: docker://eb10e152c397795ee95b4b4e8292b9ae3ee190561e6a5e2b166904790d125f34
Image: emqx/emqx:5.3.2
Image ID: docker-pullable://emqx/emqx@sha256:858305e7b0b33b28abbc29bb0063b193e9b127224ffc284a36584f910cf699d0
Port: 18083/TCP
Host Port: 0/TCP
State: Running
Started: Tue, 09 Jan 2024 12:44:02 +0100
Ready: True
Restart Count: 0
Liveness: http-get http://:dashboard/status delay=60s timeout=1s period=30s #success=1 #failure=3
Readiness: http-get http://:dashboard/status delay=10s timeout=1s period=5s #success=1 #failure=12
Environment:
EMQX_DASHBOARD__LISTENERS__HTTP__BIND: 18083
POD_NAME: test-emqx-core-57bcb74d8d-1 (v1:metadata.name)
EMQX_CLUSTER__DISCOVERY_STRATEGY: dns
EMQX_CLUSTER__DNS__RECORD_TYPE: srv
EMQX_CLUSTER__DNS__NAME: test-emqx-headless.default.svc.cluster.local
EMQX_HOST: $(POD_NAME).$(EMQX_CLUSTER__DNS__NAME)
EMQX_NODE__DATA_DIR: data
EMQX_NODE__ROLE: core
EMQX_NODE__COOKIE: <set to the key 'node_cookie' in secret 'test-emqx-node-cookie'> Optional: false
EMQX_API_KEY__BOOTSTRAP_FILE: "/opt/emqx/data/bootstrap_api_key"
EMQX_DASHBOARD__DEFAULT_USERNAME: test
EMQX_DASHBOARD__DEFAULT_PASSWORD: test
EMQX_LISTENERS__WS__DEFAULT__ENABLE: false
EMQX_LISTENERS__WSS__DEFAULT__ENABLE: false
EMQX_AUTHENTICATION__1__MECHANISM: password_based
EMQX_AUTHENTICATION__1__BACKEND: built_in_database
EMQX_AUTHENTICATION__1__PASSWORD_HASH_ALGORITHM__NAME: bcrypt
EMQX_AUTHENTICATION__1__PASSWORD_HASH_ALGORITHM__SALT_ROUNDS: 12
EMQX_AUTHENTICATION__2__MECHANISM: jwt
EMQX_AUTHENTICATION__2__USE_JWKS: false
EMQX_AUTHENTICATION__2__ALGORITHM: hmac-based
EMQX_AUTHENTICATION__2__SECRET: test
EMQX_TELEMETRY__ENABLE: false
EMQX_AUTHENTICATION__2__VERIFY_CLAIMS: {edge_node_id: "${username}"}
EMQX_CLUSTER__DISCOVERY_STRATEGY: dns
EMQX_CLUSTER__DNS__RECORD_TYPE: srv
EMQX_SYSMON__VM__LONG_SCHEDULE: disabled
EMQX_LISTENERS__TCP__DEFAULT__ENABLE: false
EMQX_LISTENERS__SSL__DEFAULT__ENABLE: false
EMQX_LISTENERS__TCP__MQTT__BIND: "0.0.0.0:1883"
Mounts:
/opt/emqx/data from test-emqx-core-data (rw)
/opt/emqx/data/bootstrap_api_key from bootstrap-api-key (ro,path="bootstrap_api_key")
/opt/emqx/etc/emqx.conf from bootstrap-config (ro,path="emqx.conf")
/opt/emqx/log from test-emqx-core-log (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-hlgvr (ro)
Readiness Gates:
Type Status
apps.emqx.io/on-serving False
Conditions:
Type Status
apps.emqx.io/on-serving False
Initialized True
Ready False
ContainersReady True
PodScheduled True
Volumes:
test-emqx-core-data:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: test-emqx-core-data-test-emqx-core-57bcb74d8d-1
ReadOnly: false
bootstrap-api-key:
Type: Secret (a volume populated by a Secret)
SecretName: test-emqx-bootstrap-api-key
Optional: false
bootstrap-config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: test-emqx-configs
Optional: false
test-emqx-core-log:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
kube-api-access-hlgvr:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events: <none>
OK, please share the EMQX operator log and the EMQX customer resource status. And you can use Markdown to format you context
Hi,
please find logs attached. It seems there was restart during the night so I can reproduce it once more and send logs.
Thank you, Anže emqx-1.log emqx-0.log emqx-operator-controller-manager.log
I found this message in EMQX operator log: etcdserver: request timed out
, maybe that is the reason of why the EMQX operator is not work normal
Hi,
I reproduced it. Please find attached logs. That error you mention is probably not related to it (my pc went to sleep mode before that log...).
kubectl get statefulsets
NAME READY AGE
test-emqx-core-57bcb74d8d 1/2 140m
Could you please enable debug log for EMQX operator, and retry it, and show the debug log of EMQX operator.
You can set development = true
in Helm chart value to enable debug log.
And when this issue is happen, please show EMQX customer resource, you can running kubectl get emqx $name -o json
Hi,
thx for response. Please find attached logs (I do not see any new log inside operator after recreation of the pod).
kubectl get emqx $name -o json
{
"apiVersion": "v1",
"items": [
{
"apiVersion": "apps.emqx.io/v2beta1",
"kind": "EMQX",
"metadata": {
"annotations": {
"apps.emqx.io/last-emqx-configuration": ""
},
"creationTimestamp": "2024-01-10T14:29:29Z",
"generation": 2,
"name": "test-emqx",
"namespace": "default",
"resourceVersion": "905163",
"uid": "90bfdbd3-3450-48d9-b845-193e37e8ccfd"
},
"spec": {
"clusterDomain": "cluster.local",
"config": {
"mode": "Merge"
},
"coreTemplate": {
"metadata": {},
"spec": {
"containerSecurityContext": {
"runAsGroup": 1000,
"runAsNonRoot": true,
"runAsUser": 1000
},
"env": [
{
"name": "EMQX_DASHBOARD__DEFAULT_USERNAME",
"value": "test"
},
{
"name": "EMQX_DASHBOARD__DEFAULT_PASSWORD",
"value": "test"
},
{
"name": "EMQX_LISTENERS__WS__DEFAULT__ENABLE",
"value": "false"
},
{
"name": "EMQX_LISTENERS__WSS__DEFAULT__ENABLE",
"value": "false"
},
{
"name": "EMQX_AUTHENTICATION__1__MECHANISM",
"value": "password_based"
},
{
"name": "EMQX_AUTHENTICATION__1__BACKEND",
"value": "built_in_database"
},
{
"name": "EMQX_AUTHENTICATION__1__PASSWORD_HASH_ALGORITHM__NAME",
"value": "bcrypt"
},
{
"name": "EMQX_AUTHENTICATION__1__PASSWORD_HASH_ALGORITHM__SALT_ROUNDS",
"value": "12"
},
{
"name": "EMQX_AUTHENTICATION__2__MECHANISM",
"value": "jwt"
},
{
"name": "EMQX_AUTHENTICATION__2__USE_JWKS",
"value": "false"
},
{
"name": "EMQX_AUTHENTICATION__2__ALGORITHM",
"value": "hmac-based"
},
{
"name": "EMQX_AUTHENTICATION__2__SECRET",
"value": "test"
},
{
"name": "EMQX_TELEMETRY__ENABLE",
"value": "false"
},
{
"name": "EMQX_AUTHENTICATION__2__VERIFY_CLAIMS",
"value": "{edge_node_id: \"${username}\"}"
},
{
"name": "EMQX_CLUSTER__DISCOVERY_STRATEGY",
"value": "dns"
},
{
"name": "EMQX_CLUSTER__DNS__RECORD_TYPE",
"value": "srv"
},
{
"name": "EMQX_SYSMON__VM__LONG_SCHEDULE",
"value": "disabled"
},
{
"name": "EMQX_LISTENERS__TCP__DEFAULT__ENABLE",
"value": "false"
},
{
"name": "EMQX_LISTENERS__SSL__DEFAULT__ENABLE",
"value": "false"
},
{
"name": "EMQX_LISTENERS__TCP__MQTT__BIND",
"value": "\"0.0.0.0:1883\""
}
],
"livenessProbe": {
"failureThreshold": 3,
"httpGet": {
"path": "/status",
"port": "dashboard"
},
"initialDelaySeconds": 60,
"periodSeconds": 30
},
"podSecurityContext": {
"fsGroup": 1000,
"fsGroupChangePolicy": "Always",
"runAsGroup": 1000,
"runAsUser": 1000,
"supplementalGroups": [
1000
]
},
"readinessProbe": {
"failureThreshold": 12,
"httpGet": {
"path": "/status",
"port": "dashboard"
},
"initialDelaySeconds": 10,
"periodSeconds": 5
},
"replicas": 2,
"resources": {},
"volumeClaimTemplates": {
"accessModes": [
"ReadWriteOnce"
],
"resources": {
"requests": {
"storage": "20Mi"
}
}
}
}
},
"image": "emqx/emqx:5.3.2",
"listenersServiceTemplate": {
"enabled": true,
"metadata": {
"annotations": {
"service.beta.kubernetes.io/azure-load-balancer-resource-group": "anze-test"
}
},
"spec": {
"loadBalancerIP": "127.0.0.1",
"ports": [
{
"name": "tcp-mqtt",
"port": 1883,
"protocol": "TCP",
"targetPort": 1883
}
],
"type": "LoadBalancer"
}
},
"replicantTemplate": {
"metadata": {},
"spec": {
"containerSecurityContext": {
"runAsGroup": 1000,
"runAsNonRoot": true,
"runAsUser": 1000
},
"livenessProbe": {
"failureThreshold": 3,
"httpGet": {
"path": "/status",
"port": "dashboard"
},
"initialDelaySeconds": 60,
"periodSeconds": 30
},
"podSecurityContext": {
"fsGroup": 1000,
"fsGroupChangePolicy": "Always",
"runAsGroup": 1000,
"runAsUser": 1000,
"supplementalGroups": [
1000
]
},
"readinessProbe": {
"failureThreshold": 12,
"httpGet": {
"path": "/status",
"port": "dashboard"
},
"initialDelaySeconds": 10,
"periodSeconds": 5
},
"replicas": 0,
"resources": {}
}
},
"revisionHistoryLimit": 3,
"updateStrategy": {
"evacuationStrategy": {
"connEvictRate": 1000,
"sessEvictRate": 1000,
"waitTakeover": 10
},
"initialDelaySeconds": 10,
"type": "Recreate"
}
},
"status": {
"conditions": [
{
"lastTransitionTime": "2024-01-10T14:30:10Z",
"message": "Create new replicaSet",
"reason": "CreateNewReplicaSet",
"status": "True",
"type": "ReplicantNodesProgressing"
},
{
"lastTransitionTime": "2024-01-10T14:30:10Z",
"message": "Core nodes is ready",
"reason": "CoreNodesReady",
"status": "True",
"type": "CoreNodesReady"
},
{
"lastTransitionTime": "2024-01-10T14:29:31Z",
"message": "Create new statefulSet",
"reason": "CreateNewStatefulSet",
"status": "True",
"type": "CoreNodesProgressing"
}
],
"coreNodes": [
{
"controllerUID": "c9dc9bf5-f46b-4cee-b563-a2a1f2b48c93",
"edition": "Opensource",
"node": "emqx@test-emqx-core-57bcb74d8d-1.test-emqx-headless.default.svc.cluster.local",
"node_status": "running",
"otp_release": "25.3.2-2/13.2.2",
"podUID": "b4c79037-c5e7-4d0b-9110-6ad48d57e7ac",
"role": "core",
"uptime": 19904,
"version": "5.3.2"
},
{
"controllerUID": "c9dc9bf5-f46b-4cee-b563-a2a1f2b48c93",
"edition": "Opensource",
"node": "emqx@test-emqx-core-57bcb74d8d-0.test-emqx-headless.default.svc.cluster.local",
"node_status": "running",
"otp_release": "25.3.2-2/13.2.2",
"podUID": "d3e3fb8e-1dc0-4cd8-a3b8-194cb0aeb1aa",
"role": "core",
"uptime": 19906,
"version": "5.3.2"
}
],
"coreNodesStatus": {
"currentReplicas": 2,
"currentRevision": "57bcb74d8d",
"readyReplicas": 2,
"replicas": 2,
"updateReplicas": 2,
"updateRevision": "57bcb74d8d"
},
"replicantNodesStatus": {
"currentRevision": "5d7b4558d5",
"updateRevision": "5d7b4558d5"
}
}
}
],
"kind": "List",
"metadata": {
"resourceVersion": "",
"selfLink": ""
}
}
operator-controller-manager.log emqx1-recreated.log emqx0.log
Maybe this also helps:
kubectl get events -o custom-columns=TS:.firstTimestamp,Count:.count,From:.source.component,Type:.type,Reason:.reason,Message:.message --sort-by='.firstTimestamp'
TS Count From Type Reason Message
2024-01-10T14:29:31Z 1 statefulset-controller Normal SuccessfulCreate create Claim test-emqx-core-data-test-emqx-core-57bcb74d8d-0 Pod test-emqx-core-57bcb74d8d-0 in StatefulSet test-emqx-core-57bcb74d8d success
2024-01-10T14:29:31Z 1 persistentvolume-controller Normal ExternalProvisioning waiting for a volume to be created, either by external provisioner "k8s.io/minikube-hostpath" or manually created by system administrator
2024-01-10T14:29:31Z 1 k8s.io/minikube-hostpath_minikube_4072228a-9bbc-4220-a8cc-258ba512cb4d Normal ProvisioningSucceeded Successfully provisioned volume pvc-40fe2dda-287e-457d-82b1-99431d404065
2024-01-10T14:29:31Z 1 k8s.io/minikube-hostpath_minikube_4072228a-9bbc-4220-a8cc-258ba512cb4d Normal Provisioning External provisioner is provisioning volume for claim "default/test-emqx-core-data-test-emqx-core-57bcb74d8d-1"
2024-01-10T14:29:31Z 1 k8s.io/minikube-hostpath_minikube_4072228a-9bbc-4220-a8cc-258ba512cb4d Normal ProvisioningSucceeded Successfully provisioned volume pvc-dee67a1f-6536-4efa-9034-70bb1803fd04
2024-01-10T14:29:31Z 1 default-scheduler Warning FailedScheduling 0/1 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod..
2024-01-10T14:29:31Z 1 k8s.io/minikube-hostpath_minikube_4072228a-9bbc-4220-a8cc-258ba512cb4d Normal Provisioning External provisioner is provisioning volume for claim "default/test-emqx-core-data-test-emqx-core-57bcb74d8d-0"
2024-01-10T14:29:31Z 2 persistentvolume-controller Normal ExternalProvisioning waiting for a volume to be created, either by external provisioner "k8s.io/minikube-hostpath" or manually created by system administrator
2024-01-10T14:29:31Z 1 default-scheduler Warning FailedScheduling 0/1 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod..
2024-01-10T14:29:31Z 1 statefulset-controller Normal SuccessfulCreate create Claim test-emqx-core-data-test-emqx-core-57bcb74d8d-1 Pod test-emqx-core-57bcb74d8d-1 in StatefulSet test-emqx-core-57bcb74d8d success
2024-01-10T14:29:31Z 1 statefulset-controller Normal SuccessfulCreate create Pod test-emqx-core-57bcb74d8d-0 in StatefulSet test-emqx-core-57bcb74d8d successful
2024-01-10T14:29:31Z 2 statefulset-controller Normal SuccessfulCreate create Pod test-emqx-core-57bcb74d8d-1 in StatefulSet test-emqx-core-57bcb74d8d successful
2024-01-10T14:29:32Z 1 default-scheduler Normal Scheduled Successfully assigned default/test-emqx-core-57bcb74d8d-0 to minikube
2024-01-10T14:29:32Z 1 default-scheduler Normal Scheduled Successfully assigned default/test-emqx-core-57bcb74d8d-1 to minikube
2024-01-10T14:29:34Z 1 kubelet Normal Started Started container emqx
2024-01-10T14:29:34Z 1 kubelet Normal Pulled Container image "emqx/emqx:5.3.2" already present on machine
2024-01-10T14:29:34Z 1 kubelet Normal Created Created container emqx
2024-01-10T14:29:34Z 1 kubelet Normal Started Started container emqx
2024-01-10T14:29:34Z 1 kubelet Normal Pulled Container image "emqx/emqx:5.3.2" already present on machine
2024-01-10T14:29:34Z 1 kubelet Normal Created Created container emqx
2024-01-10T14:29:35Z 25 emqx-controller Warning FailedToGetNodeStatuses failed to get node statues by API: failed to get API http://10.244.0.92:18083/api/v5/nodes: failed to request API: Get "http://10.244.0.92:18083/api/v5/nodes": dial tcp 10.244.0.92:18083: connect: connection refused
2024-01-10T14:29:44Z 4 kubelet Warning Unhealthy Readiness probe failed: Get "http://10.244.0.92:18083/status": dial tcp 10.244.0.92:18083: connect: connection refused
2024-01-10T14:29:48Z 4 kubelet Warning Unhealthy Readiness probe failed: Get "http://10.244.0.91:18083/status": dial tcp 10.244.0.91:18083: connect: connection refused
2024-01-10T14:30:04Z 1 kubelet Warning Unhealthy Readiness probe failed: Get "http://10.244.0.91:18083/status": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
2024-01-10T14:31:07Z 1 kubelet Normal Killing Stopping container emqx
2024-01-10T14:31:07Z 1 endpoint-controller Warning FailedToUpdateEndpoint Failed to update endpoint default/test-emqx-dashboard: Operation cannot be fulfilled on endpoints "test-emqx-dashboard": the object has been modified; please apply your changes to the latest version and try again
2024-01-10T14:31:07Z 1 endpoint-controller Warning FailedToUpdateEndpoint Failed to update endpoint default/test-emqx-listeners: Operation cannot be fulfilled on endpoints "test-emqx-listeners": the object has been modified; please apply your changes to the latest version and try again
2024-01-10T14:31:09Z 1 default-scheduler Normal Scheduled Successfully assigned default/test-emqx-core-57bcb74d8d-1 to minikube
2024-01-10T14:31:10Z 1 kubelet Normal Created Created container emqx
2024-01-10T14:31:10Z 1 kubelet Normal Pulled Container image "emqx/emqx:5.3.2" already present on machine
2024-01-10T14:31:10Z 1 kubelet Normal Started Started container emqx
OK, check the conditions and the replicantTemplate.replicas = 0 in EMQX, I think this issue is likes to https://github.com/emqx/emqx-operator/issues/1002, could you please deploy EMQX operator 2.2.12 and retry ?
Hi,
thank you very much @Rory-Z, it works now without problem in v2.2.12! I have only one more question... I noticed that from operator version > 2.2.5, besides the enabled port all other default ports are visible/avaliable to listener (load balancer) service, although they are disabled and not specified?
kubectl describe service test-emqx-listeners
Name: test-emqx-listeners
Namespace: default
Labels: apps.emqx.io/instance=test-emqx
apps.emqx.io/managed-by=emqx-operator
Annotations: apps.emqx.io/last-applied:
UEsDBBQACAAIAAAAAAAAAAAAAAAAAAAAAAAIAAAAb3JpZ2luYWykVduO3DYM/Rc+W64vs41Hb70CBXqZttNFkUwfZJleCCtLjiRvshn43wv6MvY4k26RvMkkfXhIHkpnEK26R+eVNc...
service.beta.kubernetes.io/azure-load-balancer-resource-group: anze-test
Selector: apps.emqx.io/db-role=core,apps.emqx.io/instance=test-emqx,apps.emqx.io/managed-by=emqx-operator
Type: LoadBalancer
IP Family Policy: SingleStack
IP Families: IPv4
IP: 10.108.182.245
IPs: 10.108.182.245
IP: 127.0.0.1
LoadBalancer Ingress: 127.0.0.1
Port: tcp-mqtt 1883/TCP
TargetPort: 1883/TCP
NodePort: tcp-mqtt 32623/TCP
Endpoints: 10.244.0.102:1883,10.244.0.103:1883
Port: ssl-default 8883/TCP
TargetPort: 8883/TCP
NodePort: ssl-default 30891/TCP
Endpoints: 10.244.0.102:8883,10.244.0.103:8883
Port: ws-default 8083/TCP
TargetPort: 8083/TCP
NodePort: ws-default 31260/TCP
Endpoints: 10.244.0.102:8083,10.244.0.103:8083
Port: wss-default 8084/TCP
TargetPort: 8084/TCP
NodePort: wss-default 31657/TCP
Endpoints: 10.244.0.102:8084,10.244.0.103:8084
Session Affinity: None
External Traffic Policy: Cluster
Events: <none>
Thx, Anže.
besides the enabled port all other default ports are visible/avaliable to listener (load balancer) service, although they are disabled and not specified
Yes, because they are default enable in EMQX, if you want disable it, you can set listeners.tcp.default.enable = false
in .spec.config.data
Thx. I have it disabled in coreTemplate.spec.env
(you can see it in one of previous post). Should I move this spec to.spec.config.data
then?
Thx. I have it disabled in
coreTemplate.spec.env
(you can see it in one of previous post). Should I move this spec to.spec.config.data
then?
Yes, the .spec.config.data
is better
Ok thx. It was confusing because also log from emqx says:
Listener ssl:default is NOT started due to: disabled.
Listener tcp:default is NOT started due to: disabled.
Listener tcp:mqtt on 0.0.0.0:1883 started.
Listener ws:default is NOT started due to: disabled.
Listener wss:default is NOT started due to: disabled.
Listener http:dashboard on :18083 started.
EMQX operator load the EMQX config in .spec.config.data
, and watch this config. When the EMQX operator find the some listeners is disabled, they will disable that port in service, but EMQX operator can not load EMQX config in pod env.
I recommend you to use .spec.config.data
more, if you want update EMQX config, just change .spec.config.data
is right. If you use the env in pod, you only can restart pod to update EMQX config
Description of the bug: After upgrade of emqx operator and emqx image to newest version (from apiVersion apps.emqx.io/v2alpha1 to apps.emqx.io/v2beta1) we are facing with the issue that sometimes pods are not any more binded to loadbalancer service. It happens if emqx pod is restarted by some reason after deployment. Emqx pod is up and running after restart but StatefulSets has pending status for that pod. Listeners (LoadBalancer) service does not have endpoint for that pod any more after restart.
To Reproduce This can be easy reproduced also on minikube on emqx-operator version: 2.2.x (tested on 2.2.4 - 2.2.10) and emqx version: 5.3.2 or later, if you e.g. manually delete emqx pod or if you descale/scale Statefulset. Pod is ready up and running, but stateful set has pending status for that pod forever.
This is not the case in previous versions (apiVersion: apps.emqx.io/v2alpha1) - pods are binded to loadbalancer without any issue if you e.g. delete manually one of the emqx pods or descale/scale Statefulset (operator version 2.1.2, emqx version 5.0.24). Listener/LoadBalancer service is binded with the pod correctly.
The only way to fix that issue is destroying/recreating emqx pods.
Environment details::
Thank you for your response, Anže.