emqx / MQTTX

A Powerful and All-in-One MQTT 5.0 client toolbox for Desktop, CLI and WebSocket.
https://mqttx.app
Apache License 2.0
3.8k stars 438 forks source link

[Help] Connection Issues with EMQX in EKS Cluster #1726

Closed rakeshreddyrg09 closed 1 month ago

rakeshreddyrg09 commented 1 month ago

Hi Team, I have a cluster created in EKS, components using emqx-v5.7.1, kong - using NLB I am using mqttx to publish a message using below command

mqttx pub \
  --hostname mqtt.iot-test.org.com \
  --port 1883 \
  --topic devices \
  --message "{\"deviceId\":16,\"value\":40.1,\"region\":\"EMEA\",\"timestamp\":1482236627236}" \
  --username 'admin' \
  --password 'public'

here are the logs

mqttjs connecting to an MQTT broker... +0ms
  mqttjs:client MqttClient :: options.protocol mqtt +0ms
  mqttjs:client MqttClient :: options.protocolVersion 5 +0ms
  mqttjs:client MqttClient :: options.username admin +0ms
  mqttjs:client MqttClient :: options.keepalive 30 +0ms
  mqttjs:client MqttClient :: options.reconnectPeriod 1000 +0ms
  mqttjs:client MqttClient :: options.rejectUnauthorized undefined +0ms
  mqttjs:client MqttClient :: options.topicAliasMaximum undefined +0ms
  mqttjs:client MqttClient :: clientId mqttx_2ea4955d +1ms
  mqttjs:client MqttClient :: setting up stream +0ms
  mqttjs:client _setupStream :: calling method to clear reconnect +0ms
  mqttjs:client _clearReconnect : clearing reconnect timer +0ms
  mqttjs:client _setupStream :: using streamBuilder provided to client to create stream +0ms
  mqttjs calling streambuilder for mqtt +2ms
  mqttjs:tcp port 1883 and host mqtt.iot-test.org.com +0ms
  mqttjs:client _setupStream :: pipe stream to writable stream +3ms
  mqttjs:client _setupStream: sending packet `connect` +0ms
  mqttjs:client sendPacket :: packet: { cmd: 'connect' } +1ms
  mqttjs:client sendPacket :: emitting `packetsend` +1ms
  mqttjs:client sendPacket :: writing to stream +0ms
  mqttjs:client sendPacket :: writeToStream result true +11ms
⠸ Connecting...  mqttjs:client !!connectTimeout hit!! Calling _cleanUp with force `true` +30s
  mqttjs:client _cleanUp :: forced? true +0ms
  mqttjs:client _cleanUp :: (mqttx_2ea4955d) :: destroying stream +0ms
  mqttjs:client _cleanUp :: client not disconnecting. Clearing and resetting reconnect. +1ms 
  mqttjs:client _clearReconnect : clearing reconnect timer +0ms
  mqttjs:client _setupReconnect :: emit `offline` state +0ms
  mqttjs:client _setupReconnect :: set `reconnecting` to `true` +0ms
  mqttjs:client _setupReconnect :: setting reconnectTimer for 1000 ms +0ms
  mqttjs:client (mqttx_2ea4955d)stream :: on close +0ms
  mqttjs:client flushVolatile :: deleting volatile messages from the queue and setting their callbacks as error function +0ms
  mqttjs:client stream: emit close to MqttClient +0ms
  mqttjs:client close :: connected set to `false` +0ms
  mqttjs:client close :: clearing connackTimer +0ms
  mqttjs:client close :: clearing ping timer +0ms
  mqttjs:client close :: calling _setupReconnect +0ms
  mqttjs:client _setupReconnect :: doing nothing... +0ms
⠴ Connecting...  mqttjs:client reconnectTimer :: reconnect triggered! +1s
  mqttjs:client _reconnect: emitting reconnect to client +0ms
  mqttjs:client _reconnect: calling _setupStream +0ms
  mqttjs:client _setupStream :: calling method to clear reconnect +0ms
  mqttjs:client _clearReconnect : clearing reconnect timer +0ms
  mqttjs:client _setupStream :: using streamBuilder provided to client to create stream +1ms
  mqttjs calling streambuilder for mqtt +31s
  mqttjs:tcp port 1883 and host mqtt.iot-test.org.com +31s
  mqttjs:client _setupStream :: pipe stream to writable stream +0ms
  mqttjs:client _setupStream: sending packet `connect` +0ms
  mqttjs:client sendPacket :: packet: { cmd: 'connect' } +0ms
  mqttjs:client sendPacket :: emitting `packetsend` +1ms
  mqttjs:client sendPacket :: writing to stream +0ms
  mqttjs:client sendPacket :: writeToStream result true +0ms
  ⠏ Reconnecting...[1/10]

I am getting the above error. I have creates tcpIngress as well.

apiVersion: configuration.konghq.com/v1beta1
kind: TCPIngress
metadata:
  name: mqttingress
  namespace: mqtt
  annotations:
    kubernetes.io/ingress.class: "kong-ingress"
spec:
  rules:
    - port: 1883 # MQTT port
      backend:
        serviceName: emqx
        servicePort: 1883
    - port: 8883 # MQTT SSL port
      backend:
        serviceName: emqx
        servicePort: 8883

Also added below config in my KONG

      nlbConfig: true
      proxy:
        streamPorts:
          - protocol: TCP
            containerPort: 1883
            servicePort: 1883
          # MQTT SSL Port
          - protocol: TCP
            containerPort: 8883
            servicePort: 8883
        tls:
          containerPort: 8000

can you team help me out here https://emqx.slack.com/archives/C02D74THZPF/p1721628206450979

ieQu1 commented 1 month ago

We need to see logs from the broker side. logs/emqx.log.* files are of particular interest. Could you also show metrics from the broker, broker_ip:18083/#/dashboard/metrics What is the value of client.connect metric?

id commented 1 month ago

@rakeshreddyrg09 it's too early to talk about MQTT or MQTTX issues. Please make sure you can establish a plain TCP connection to that endpoint and port 1883 first. It does not work for me.

telnet mqtt.iot-test.org.com 1883
Trying 34.206.39.153...
^C
rakeshreddyrg09 commented 1 month ago

We need to see logs from the broker side. logs/emqx.log.* files are of particular interest. Could you also show metrics from the broker, broker_ip:18083/#/dashboard/metrics What is the value of client.connect metric? image image from the above the pop is continuously coming even after providing the creds

rakeshreddyrg09 commented 1 month ago

@rakeshreddyrg09 it's too early to talk about MQTT or MQTTX issues. Please make sure you can establish a plain TCP connection to that endpoint and port 1883 first. It does not work for me.

telnet mqtt.iot-test.org.com 1883
Trying 34.206.39.153...
^C

sorry I can't share the full address here, but I am able to connect image

rakeshreddyrg09 commented 1 month ago

client.connect

image

id commented 1 month ago

I'm not familiar with Kong ingress, but so far I think the routing is not configured correctly between NLB and EMQX, since telnet probe works, but the traffic does not reach EMQX.

Could you confirm that you have followed this guide? https://docs.konghq.com/kubernetes-ingress-controller/latest/guides/services/tcp/

One way to troubleshoot this would be to setup an echo service on port 1883, make sure that one works, then switch back to emqx.

rakeshreddyrg09 commented 1 month ago

here i have enabled echo service on 1883 here is my config for echo-service

apiVersion: v1
kind: Service
metadata:
  labels:
    app: echo
  name: echo
spec:
  ports:
    - port: 1883
      name: tcp
      protocol: TCP
      targetPort: 1025
  selector:
    app: echo
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: echo
  name: echo
spec:
  replicas: 1
  selector:
    matchLabels:
      app: echo
  strategy: {}
  template:
    metadata:
      labels:
        app: echo
    spec:
      containers:
        - image: kong/go-echo:latest
          name: echo
          ports:
            - containerPort: 1025
          env:
            - name: NODE_NAME
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
            - name: POD_NAME
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
            - name: POD_NAMESPACE
              valueFrom:
                fieldRef:
                  fieldPath: metadata.namespace
            - name: POD_IP
              valueFrom:
                fieldRef:
                  fieldPath: status.podIP
          resources: {}
---
apiVersion: configuration.konghq.com/v1beta1
kind: TCPIngress
metadata:
  name: echo-plaintext
  annotations:
    kubernetes.io/ingress.class: "prod-kong-ingress"
spec:
  rules:
    - port: 1883
      backend:
        serviceName: echo
        servicePort: 1883

below is the telnet output. image I am able to receive the traffic from 1883.

ieQu1 commented 1 month ago

The log directory is empty, which is not right. Is the broker running?

rakeshreddyrg09 commented 1 month ago

image this is what i can see in log dir

ieQu1 commented 1 month ago

Do client.connect, authentication.success and client.connack metrics increase when you try connecting with MQTTX? Can you also collect tcpdump on the client side (where MQTTX is running)? P.S. I would appreciate if you attach full output of metrics endpoint, and as plain text rather than screenshot.

rakeshreddyrg09 commented 1 month ago

metrics is not getting increased when I use the MQTTX command in terminal (wsl). but when I try with websocket port via #test.ts file the metrics is increasing.

import * as mqtt from 'mqtt';

// MQTT server configuration
const brokerUrl = 'ws://mqtt.iot-test.xyz.org.cloud/mqtt';
const topic = 'test/topic';
const messages = [
  'Hello, MQTT! Message 1',
  'Hello, MQTT! Message 2',
  'Hello, MQTT! Message 3',
  'Hello, MQTT! Message 4',
  'Hello, MQTT! Message 5',
];

// Create MQTT client
const client = mqtt.connect(brokerUrl);

// Connection event
client.on('connect', () => {
  console.log('Connected to MQTT broker');

  // Publish messages
  let messageCount = 0;

  function publishNextMessage() {
    if (messageCount < messages.length) {
      client.publish(topic, messages[messageCount], (err) => {
        if (err) {
          console.error(`Error publishing message ${messageCount + 1}:`, err);
        } else {
          console.log(`Message ${messageCount + 1} published successfully`);
        }

        messageCount++;
        publishNextMessage();
      });
    } else {
      // All messages sent, close the connection
      client.end();
    }
  }

  publishNextMessage();
});

// Error event
client.on('error', (err) => {
  console.error('MQTT error:', err);
});

// Close event
client.on('close', () => {
  console.log('Disconnected from MQTT broker');
});

// Debug logs
client.on('packetsend', (packet) => {
  console.log('Packet sent:', packet);
});

client.on('packetreceive', (packet) => {
  console.log('Packet received:', packet);
});

here is how I am using mqttx

mqttx pub \
   --hostname mqtt.iot-test.xyz.org.cloud \
   --port 1883 \
   --topic devices \
   --message "{\"deviceId\":1,\"value\":40.1,\"region\":\"EMEA\",\"timestamp\":1482236627236}"\
   --username 'admin' \
   --password 'public'

here is the output of my tcpdump:

sudo tcpdump -i any port 1883
tcpdump: data link type LINUX_SLL2
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
17:28:36.079930 eth0  Out IP 172.0.0.1.56530 > ec2-60-29-85-95.us-west-1.compute.amazonaws.com.1883: Flags [S], seq 2518595916, win 64240, options [mss 1460,sackOK,TS val 681839938 ecr 0,nop,wscale 7], length 0
17:28:36.109663 eth0  In  IP ec2-60-29-85-95.us-west-1.compute.amazonaws.com.1883 > 172.0.0.1.56530: Flags [S.], seq 3496926793, ack 2518595917, win 65535, options [mss 1250,nop,wscale 5,sackOK,TS val 2254093310 ecr 681839938], length 0
17:28:36.109713 eth0  Out IP 172.0.0.1.56530 > ec2-60-29-85-95.us-west-1.compute.amazonaws.com.1883: Flags [.], ack 1, win 502, options [nop,nop,TS val 681839968 ecr 2254093310], length 0
17:28:36.110398 eth0  Out IP 172.0.0.1.56530 > ec2-60-29-85-95.us-west-1.compute.amazonaws.com.1883: Flags [P.], seq 1:47, ack 1, win 502, options [nop,nop,TS val 681839968 ecr 2254093310], length 46      
17:28:36.230422 eth0  In  IP ec2-60-29-85-95.us-west-1.compute.amazonaws.com.1883 > 172.0.0.1.56530: Flags [.], ack 47, win 2050, options [nop,nop,TS val 2254093324 ecr 681839968], length 0
17:28:36.384433 eth0  In  IP ec2-60-29-85-95.us-west-1.compute.amazonaws.com.1883 > 172.0.0.1.56530: Flags [R.], seq 1, ack 47, win 2050, options [nop,nop,TS val 2254093341 ecr 681839968], length 0
17:28:39.438326 eth0  Out IP 172.0.0.1.51940 > ec2-54-8-114-121.us-west-1.compute.amazonaws.com.1883: Flags [S], seq 2215438111, win 64240, options [mss 1460,sackOK,TS val 583185780 ecr 0,nop,wscale 7], length 0
17:28:39.708537 eth0  In  IP ec2-54-8-114-121.us-west-1.compute.amazonaws.com.1883 > 172.0.0.1.51940: Flags [S.], seq 1090391805, ack 2215438112, win 26847, options [mss 1250,sackOK,TS val 2334531413 ecr 583185780,nop,wscale 12], length 0
17:28:39.708569 eth0  Out IP 172.0.0.1.51940 > ec2-54-8-114-121.us-west-1.compute.amazonaws.com.1883: Flags [.], ack 1, win 502, options [nop,nop,TS val 583186050 ecr 2334531413], length 0
17:28:39.708939 eth0  Out IP 172.0.0.1.51940 > ec2-54-8-114-121.us-west-1.compute.amazonaws.com.1883: Flags [P.], seq 1:47, ack 1, win 502, options [nop,nop,TS val 583186051 ecr 2334531413], length 46     
17:28:40.536162 eth0  Out IP 172.0.0.1.51940 > ec2-54-8-114-121.us-west-1.compute.amazonaws.com.1883: Flags [P.], seq 1:47, ack 1, win 502, options [nop,nop,TS val 583186878 ecr 2334531413], length 46
17:28:41.416139 eth0  Out IP 172.0.0.1.51940 > ec2-54-8-114-121.us-west-1.compute.amazonaws.com.1883: Flags [P.], seq 1:47, ack 1, win 502, options [nop,nop,TS val 583187758 ecr 2334531413], length 46
17:28:43.096154 eth0  Out IP 172.0.0.1.51940 > ec2-54-8-114-121.us-west-1.compute.amazonaws.com.1883: Flags [P.], seq 1:47, ack 1, win 502, options [nop,nop,TS val 583189438 ecr 2334531413], length 46
17:28:46.456171 eth0  Out IP 172.0.0.1.51940 > ec2-54-8-114-121.us-west-1.compute.amazonaws.com.1883: Flags [P.], seq 1:47, ack 1, win 502, options [nop,nop,TS val 583192798 ecr 2334531413], length 46
17:28:53.336159 eth0  Out IP 172.0.0.1.51940 > ec2-54-8-114-121.us-west-1.compute.amazonaws.com.1883: Flags [P.], seq 1:47, ack 1, win 502, options [nop,nop,TS val 583199678 ecr 2334531413], length 46
17:29:06.776210 eth0  Out IP 172.0.0.1.51940 > ec2-54-8-114-121.us-west-1.compute.amazonaws.com.1883: Flags [P.], seq 1:47, ack 1, win 502, options [nop,nop,TS val 583213118 ecr 2334531413], length 46
17:29:09.436807 eth0  Out IP 172.0.0.1.51940 > ec2-54-8-114-121.us-west-1.compute.amazonaws.com.1883: Flags [F.], seq 47, ack 1, win 502, options [nop,nop,TS val 583215779 ecr 2334531413], length 0
17:29:10.442335 eth0  Out IP 172.0.0.1.56536 > ec2-60-29-85-95.us-west-1.compute.amazonaws.com.1883: Flags [S], seq 3824089189, win 64240, options [mss 1460,sackOK,TS val 681874300 ecr 0,nop,wscale 7], length 0
17:29:10.471384 eth0  In  IP ec2-60-29-85-95.us-west-1.compute.amazonaws.com.1883 > 172.0.0.1.56536: Flags [S.], seq 1503393144, ack 3824089190, win 65535, options [mss 1250,nop,wscale 5,sackOK,TS val 1694486591 ecr 681874300], length 0
17:29:10.471405 eth0  Out IP 172.0.0.1.56536 > ec2-60-29-85-95.us-west-1.compute.amazonaws.com.1883: Flags [.], ack 1, win 502, options [nop,nop,TS val 681874329 ecr 1694486591], length 0
17:29:10.471645 eth0  Out IP 172.0.0.1.56536 > ec2-60-29-85-95.us-west-1.compute.amazonaws.com.1883: Flags [P.], seq 1:47, ack 1, win 502, options [nop,nop,TS val 681874330 ecr 1694486591], length 46      
17:29:10.505425 eth0  In  IP ec2-54-8-114-121.us-west-1.compute.amazonaws.com.1883 > 172.0.0.1.51940: Flags [R], seq 1090391806, win 26847, length 0
17:29:10.580101 eth0  In  IP ec2-60-29-85-95.us-west-1.compute.amazonaws.com.1883 > 172.0.0.1.56536: Flags [.], ack 47, win 2050, options [nop,nop,TS val 1694486604 ecr 681874330], length 0
17:29:10.750527 eth0  In  IP ec2-60-29-85-95.us-west-1.compute.amazonaws.com.1883 > 172.0.0.1.56536: Flags [R.], seq 1, ack 47, win 2050, options [nop,nop,TS val 1694486622 ecr 681874330], length 0
ieQu1 commented 1 month ago

As far as I can tell, there are some network problems. Let's follow a connection with source port 51940:

17:28:39.438326 eth0  Out IP 172.0.0.1.51940 > ec2-54-8-114-121.us-west-1.compute.amazonaws.com.1883: Flags [S], seq 2215438111, win 64240, options [mss 1460,sackOK,TS val 583185780 ecr 0,nop,wscale 7], length 0
17:28:39.708537 eth0  In  IP ec2-54-8-114-121.us-west-1.compute.amazonaws.com.1883 > 172.0.0.1.51940: Flags [S.], seq 1090391805, ack 2215438112, win 26847, options [mss 1250,sackOK,TS val 2334531413 ecr 583185780,nop,wscale 12], length 0
17:28:39.708569 eth0  Out IP 172.0.0.1.51940 > ec2-54-8-114-121.us-west-1.compute.amazonaws.com.1883: Flags [.], ack 1, win 502, options [nop,nop,TS val 583186050 ecr 2334531413], length 0
17:28:39.708939 eth0  Out IP 172.0.0.1.51940 > ec2-54-8-114-121.us-west-1.compute.amazonaws.com.1883: Flags [P.], seq 1:47, ack 1, win 502, options [nop,nop,TS val 583186051 ecr 2334531413], length 46     
17:28:40.536162 eth0  Out IP 172.0.0.1.51940 > ec2-54-8-114-121.us-west-1.compute.amazonaws.com.1883: Flags [P.], seq 1:47, ack 1, win 502, options [nop,nop,TS val 583186878 ecr 2334531413], length 46
17:28:41.416139 eth0  Out IP 172.0.0.1.51940 > ec2-54-8-114-121.us-west-1.compute.amazonaws.com.1883: Flags [P.], seq 1:47, ack 1, win 502, options [nop,nop,TS val 583187758 ecr 2334531413], length 46
17:28:43.096154 eth0  Out IP 172.0.0.1.51940 > ec2-54-8-114-121.us-west-1.compute.amazonaws.com.1883: Flags [P.], seq 1:47, ack 1, win 502, options [nop,nop,TS val 583189438 ecr 2334531413], length 46
17:28:46.456171 eth0  Out IP 172.0.0.1.51940 > ec2-54-8-114-121.us-west-1.compute.amazonaws.com.1883: Flags [P.], seq 1:47, ack 1, win 502, options [nop,nop,TS val 583192798 ecr 2334531413], length 46
17:28:53.336159 eth0  Out IP 172.0.0.1.51940 > ec2-54-8-114-121.us-west-1.compute.amazonaws.com.1883: Flags [P.], seq 1:47, ack 1, win 502, options [nop,nop,TS val 583199678 ecr 2334531413], length 46
17:29:06.776210 eth0  Out IP 172.0.0.1.51940 > ec2-54-8-114-121.us-west-1.compute.amazonaws.com.1883: Flags [P.], seq 1:47, ack 1, win 502, options [nop,nop,TS val 583213118 ecr 2334531413], length 46
17:29:09.436807 eth0  Out IP 172.0.0.1.51940 > ec2-54-8-114-121.us-west-1.compute.amazonaws.com.1883: Flags [F.], seq 47, ack 1, win 502, options [nop,nop,TS val 583215779 ecr 2334531413], length 0
...
17:29:10.505425 eth0  In  IP ec2-54-8-114-121.us-west-1.compute.amazonaws.com.1883 > 172.0.0.1.51940: Flags [R], seq 1090391806, win 26847, length 0

After a successful TCP handshake, at 17:28:39.708939 client presumably tries to send MQTT CONNECT packet; there's no TCP ack from the server, client then retransmits the TCP segment multiple times, and finally gives up at 17:29:09.436807. Then, at 17:29:10.505425, server replies with RST for the sequence number corresponding to the initial SYN. This tells me that all MQTT CONNECT packets got lost before reaching the server for whatever reason.

This behavior indicates a problem with the network on the server. Perhaps firewall or a load-balancer is improperly configured.

Neither EMQX or MQTTX can be held responsible for such behavior, since both operate at the application layer and cannot possibly break the transport layer flow as described above.

ieQu1 commented 1 month ago

P.S. In typescript example you have MQTT over websocket: ws://mqtt.iot-test.xyz.org.cloud/mqtt In MQTTX you have: mqtt.iot-test.xyz.org.cloud and MQTT over plain TCP. Different protocol, different TCP port.

rakeshreddyrg09 commented 1 month ago

We have identified what is the issue from our side, the Issue is with kong configuration in EKS. SSL certificate is the one causing the issue so removed from the annotations.