emqx / emqx-bridge-mqtt

Bridge of MQTT (deprecated since EMQX v5)
https://www.emqx.com
Apache License 2.0
37 stars 13 forks source link

Bridge connection loops messages back forever #63

Open mspoehr opened 4 years ago

mspoehr commented 4 years ago

I am using the emqx-bridge-mqtt plugin to bridge EMQX to an AWS IoT endpoint. Occasionally (seemly randomly, on emqx start) the connection will start spamming the same messages over and over via the bridge connection until the service is restarted again. It appears that this issue occurs roughly 25% of the time when emqx starts up.

I am using emqx version 4.0.5, with this plugin configured to be loaded on startup (via /var/lib/emqx/loaded_plugins) on Ubuntu Linux 18.04.

Below is an excerpt from the log when this issue occurs.

2020-04-01 13:43:22.253 [warning] <<"someclientid">>@127.0.0.1:40844 [Session] Dropped msg due to mqueue is full: Message(Id=^@^E¢>7Û^ÝôB^@^@^F#Uñ, QoS=1, Topic=aws/some/topic/structure, From=bridge, Flags=[], Headers=)
...
2020-04-01 13:43:22.253 [error] [Bridge] Can't be found from the inflight:45091

Those messages can be seen repeatedly with different identifiers and topics.

The following is the emqx_bridge_mqtt.conf being used:

bridge.mqtt.aws.address = xxxxxxxxxxxxxx-ats.iot.us-west-2.amazonaws.com:8883
bridge.mqtt.aws.proto_ver = mqttv4
bridge.mqtt.aws.start_type = auto
bridge.mqtt.aws.bridge_mode = true
bridge.mqtt.aws.clientid = someremoteclientid
bridge.mqtt.aws.clean_start = true
bridge.mqtt.aws.forwards = cloud/#
bridge.mqtt.aws.forward_mountpoint = some/topic/structure
bridge.mqtt.aws.subscription.1.topic = some/topic/structure/cloud/#
bridge.mqtt.aws.subscription.1.qos = 1
bridge.mqtt.aws.receive_mountpoint = aws/
bridge.mqtt.aws.ssl = on
bridge.mqtt.aws.cacertfile = /path/to/AmazonRootCA1.pem
bridge.mqtt.aws.certfile = /path/to/id_rsa.crt
bridge.mqtt.aws.keyfile = /path/to/id_rsa.key
bridge.mqtt.aws.ciphers = ECDHE-ECDSA-AES256-GCM-SHA384,ECDHE-RSA-AES256-GCM-SHA384
bridge.mqtt.aws.keepalive = 60s
bridge.mqtt.aws.tls_versions = tlsv1.2

You may notice I am bridging both to and from cloud/# on the bridge connection. I would expect a single loopback of all bridged messages if any clients subscribe locally - and this does occur the 75% of the time where emqx is not spamming messages. Could this be causing the issue the other 25% of the time? Any config recommendations or is this a bug with emqx?

mspoehr commented 4 years ago

We've changed our IoT rules to also accept messages on a topic structure separate from the one being subscribed to. This is the resulting config:

bridge.mqtt.aws.address = xxxxxxxxxxxxxx-ats.iot.us-west-2.amazonaws.com:8883
bridge.mqtt.aws.proto_ver = mqttv4
bridge.mqtt.aws.start_type = auto
bridge.mqtt.aws.bridge_mode = true
bridge.mqtt.aws.clientid = someremoteclientid
bridge.mqtt.aws.clean_start = true
bridge.mqtt.aws.forwards = to-aws/#
bridge.mqtt.aws.forward_mountpoint = some/topic/structure
bridge.mqtt.aws.subscription.1.topic = some/topic/structure/cloud/#
bridge.mqtt.aws.subscription.1.qos = 1
bridge.mqtt.aws.receive_mountpoint = aws/
bridge.mqtt.aws.ssl = on
bridge.mqtt.aws.cacertfile = /path/to/AmazonRootCA1.pem
bridge.mqtt.aws.certfile = /path/to/id_rsa.crt
bridge.mqtt.aws.keyfile = /path/to/id_rsa.key
bridge.mqtt.aws.ciphers = ECDHE-ECDSA-AES256-GCM-SHA384,ECDHE-RSA-AES256-GCM-SHA384
bridge.mqtt.aws.keepalive = 60s
bridge.mqtt.aws.tls_versions = tlsv1.2

With the config in the previous comment I expected a single loopback 100% of the time, but instead got infinite loopback some percentage of the time. With this new config, I don't expect any loopback, ever. I'm still seeing the same issue with infinite loopback. This tells me that the issue does not have anything to do with attempting to send and receive from the same topic structure, as sending to some/topic/structure/to-aws and subscribing to some/topic/structure/cloud should be completely disjoint.

I was able to restart emqx a (seemingly) random number of times to get the issue to go away.

Any thoughts on other config options that could be causing this?

turtleDeng commented 4 years ago

There is a problem with your configuration, causing the message to be sent in a loop

bridge.mqtt.aws.forwards = cloud/#
bridge.mqtt.aws.forward_mountpoint = some/topic/structure

bridge.mqtt.aws.subscription.1.topic = some/topic/structure/cloud/#
bridge.mqtt.aws.subscription.1.qos = 1

The emqx bridged messages will be sent to AWS IoT via some/topic/structure/cloud/# topic

You configured again


bridge.mqtt.aws.subscription.1.topic = some/topic/structure/cloud/#
bridge.mqtt.aws.subscription.1.qos = 1
`` `
Subscribe to some/topic/structure/cloud/# in AWS IoT, so the message will loop
mspoehr commented 4 years ago

Thanks for the response. Changing the configuration so that there isn't a loop, I still see this exact same issue. Ideally I'd be able to send/receive to the bridge on the same topic structure, but it isn't a deal breaker if this isn't possible.

My config now contains:

bridge.mqtt.aws.forwards = to-aws/#
bridge.mqtt.aws.forward_mountpoint = some/topic/structure
bridge.mqtt.aws.subscription.1.topic = some/topic/structure/cloud/#

Messages should be sent to AWS on some/topic/structure/to-aws, and received from the subscription some/topic/structure/cloud. With this new config, I still see the same issue.

I was able to find some more information while debugging as well:

Thus, the issue is not looping so much as sending too many messages quickly with AWS IoT causes some sort of bad state.

qingchuwudi commented 4 years ago

Maybe it is retransmission.

          qos1 +-------+                 qos2 +-------+                 qos3
Publisher ---> | Node1 | --Bridge Forward---> | Node2 | --Bridge Forward---> Subscriber
               +-------+                      +-------+
mspoehr commented 4 years ago

I had initially thought the same. I'm not sure that we know definitively that qos2 is '1'. Since my latest config has the publish/subscribe topics completely disjoint, the '1' qos for subscribed topics should not effect which QoS published messages are sent out as.

In my bash example above, mosquitto_pub defaults to sending messages with QoS 0. Therefore, I would expect that both qos1 and qos2 is '0'.

I'm not sure what qos3 was during my testing. I would like to say that I tested with both 0 and 1, but I'm not 100% sure about that.

turtleDeng commented 4 years ago

You can refer to https://docs.emqx.io/broker/latest/en/configuration/configuration.html#zoneexternalupgradeqos

saumilsdk commented 4 years ago

I am also having the same problem. I have AWS IOT as broker and emqx bridge is to bridge devices using MQTT-SN protocol to send data to this emqx bridge. The same data comes back on each publish.

I have to have MQTT based devices which are sending data to AWS IOT direectly which should reach to MQTT-SN based devices running behind emqx bridge.

saumilsdk commented 4 years ago

@mspoehr or @turtleDeng can you please help in resolving looping in case brigde is subscribing same topics as publishing? I am connecting bridge to AWS IOT endpoint.

mspoehr commented 4 years ago

You can refer to https://docs.emqx.io/broker/latest/en/configuration/configuration.html#zoneexternalupgradeqos

I really don't think this is a QoS issue. This issue occurs when using any combination of QoS values, even with all 0's, which should never cause this.

Can you please help in resolving looping in case brigde is subscribing same topics as publishing?

@saumilsdk I am not sure that this is possible with emqx in its current state. This issue seems like a bug in emqx to me. In my case, I was able to configure my publishing and bridge subscriptions to be completely disjoint, and I still received the same messages looped back forever.

If you're not experiencing the messages being looped back forever, but instead just receiving the same message you publish one time-I would actually expect this behavior.

saumilsdk commented 4 years ago

@mspoehr Hi i agree with you if i get the same message twice but here I am stuck with looping forever and ended up restarting server every time. I can find no way out of this issue. Any help will be appreciated. Here is my bridge config. I am using EMQX-SN plugin to act as gateway and EMQX-BRIDGE to bridge the gateway to end AWS IOT broker.

@qingchuwudi and @turtleDeng If you guys can also look into this.

bridge.mqtt.emqx2.start_type = auto

bridge.mqtt.emqx2.address = a3itfXXXX.iot.us-east-1.amazonaws.com:8883

bridge.mqtt.emqx2.proto_ver = mqttv4

bridge.mqtt.emqx2.clientid = bridge_emqx2

bridge.mqtt.emqx2.clean_start = true

bridge.mqtt.emqx2.ssl = on

bridge.mqtt.emqx2.cacertfile = /etc/mqtt/certs/rootCA.pem

bridge.mqtt.emqx2.certfile = /etc/mqtt/certs/cert.crt

bridge.mqtt.emqx2.keyfile = /etc/mqtt/certs/private.key

bridge.mqtt.emqx2.ciphers = ECDHE-ECDSA-AES256-GCM-SHA384,ECDHE-RSA-AES256-GCM-SHA384

PSK-AES128-CBC-SHA,PSK-AES256-CBC-SHA,PSK-3DES-EDE-CBC-SHA,PSK-RC4-SHA

bridge.mqtt.emqx2.keepalive = 60s

bridge.mqtt.emqx2.tls_versions = tlsv1.2,tlsv1.1,tlsv1

bridge.mqtt.emqx2.forwards = #

bridge.mqtt.emqx2.subscription.1.topic = #

bridge.mqtt.emqx2.subscription.1.qos = 1

bridge.mqtt.emqx2.reconnect_interval = 30s

bridge.mqtt.emqx2.retry_interval = 20s

bridge.mqtt.emqx2.max_inflight_size = 32
mspoehr commented 4 years ago

@ saumilsdk I'm not sure if your use case will work with this, but you could try adding a receive_mountpoint just to see if it helps. In my case, I had:

bridge.mqtt.aws.receive_mountpoint = aws/

^ but this still didn't fix the issue for me. I could see in your case where emqx could loop back infinitely if you are bridging # in both directions with no prefixes on either side. Still, for a "bridge" plugin, it seems like this should be a supported use case. But it seems that it is not.

saumilsdk commented 4 years ago

@mspoehr i had tried adding both the mount points but seems looping still happens and topic prefix also keeps getting added on the messages looped. As you know i am not running emqx broker and only emqx-sn and emqx-bridge i am running, what options do we have for these to disable looping?

bridge.mqtt.emqx2.forward_mountpoint = tmp/forward/aws/
bridge.mqtt.emqx2.receive_mountpoint = tmp/receive/aws/
gbunel29 commented 4 years ago

Did you have solution for this issue? I'm also facing this same issue with bridge

saumilsdk commented 4 years ago

@gbunel29 i have moved from emqx to paho mqtt-sn gateway which doesn't have loopback issue. @mspoehr FYI

wwhai commented 4 years ago

@mspoehr i had tried adding both the mount points but seems looping still happens and topic prefix also keeps getting added on the messages looped. As you know i am not running emqx broker and only emqx-sn and emqx-bridge i am running, what options do we have for these to disable looping?

bridge.mqtt.emqx2.forward_mountpoint = tmp/forward/aws/
bridge.mqtt.emqx2.receive_mountpoint = tmp/receive/aws/

——— It will verb loop when publish topic same as subscribe topic。Suggest you change your topic such:

Maybe add prefix or suffix will avoid this problem .

Trance-Paradox commented 2 years ago

This looping error is occurring again. Message ar looping forever when published on same topic.