eclipse / mosquitto

Eclipse Mosquitto - An open source MQTT broker
https://mosquitto.org
Other
8.89k stars 2.37k forks source link

When per_listener_settings is true, queued messages may get lost between bridged brokers #1891

Closed dk-teknologisk-mnc closed 3 years ago

dk-teknologisk-mnc commented 3 years ago

Problem

When per_listener_settings is set to true on a broker, messages published to and queued on a bridged broker are not delivered to the first broker's subscribing clients. Mosquitto version is 1.6.12.

How to reproduce

  1. Start two brokers - broker A and B.
  2. A is bridged to B with cleansession false and topic hello
  3. Connect a client to broker A that subscribes to topic hello with QoS 1 and clean: false
  4. Stop broker A
  5. Publish a message with topic hello and payload world on broker B with QoS 1
  6. Start broker A again
  7. When the client that subscribed on broker A reconnects, it should get the queued message, but it doesn't

Detailed description

This is how my MQTT clients and brokers are connected:

SUBSCRIBER -> BROKER A -> BROKER B <- PUBLISHER

The subscriber (a nodejs client) connects with clean: false to broker A while broker A bridges to broker B with cleansession false. So if broker A is temporarily stopped, messages published to broker B are queued on broker B in the meantime.

When broker A is started again, the queued messages on Broker B should be delivered to the subscriber via broker A when the subscriber reconnects.

This seems to work only if per_listener_settings is set false on broker A. When per_listener_settings is true, the messages queued on broker B are not delivered to the subscriber.

It's also worth mentioning that if the messages are published directly to broker A, queuing is working fine. Can be verified with the following procedure:

  1. Start broker A
  2. Connect a client to broker A that subscribes to topic hello with QoS 1 and clean: false
  3. Disconnect the client
  4. Publish a message with topic hello and payload world on broker A with QoS 1
  5. Reconnect the client - and the message arrives

So as far as I can see, there seems to be an issue when messages need to go through two brokers that are bridged.

Messages are published using mosquitto_pub with QoS 1:

mosquitto_pub -h broker-b.mydomain.net -t hello -m world -q 1 

I have stripped my configurations down to make things as simple as possible - everything except the settings listed below is default.

Broker A configuration

persistence true
per_listener_settings true

# Bridge to broker B
connection broker-a-bridge-client
address broker-b.mydomain.net
cleansession false
topic hello in 1

(I know that having per_listener_settings set to true looks stupid when there are no other listeners. I removed them to simplify the example, but the problem is the same with and without the extra listeners.)

Broker B configuration

persistence true

Nodejs subscriber

const mqtt = require("mqtt");

const client = mqtt.connect("mqtt://broker-a.mydomain.net", {
  clientId: "test-client-subscriber",
  clean: false
});

client.on("connect", (connack) => {
  console.log("Connected")
  if (connack.sessionPresent) {
    console.log("Session present - no need to resubscribe")
  } else {
    client.subscribe("hello",{qos: 1});
  }
});

client.on("message", function(topic, message, packet) {
  console.log(topic,message.toString());
});

Software versions used

Mosquitto version 1.6.12 Nodejs version v12.19.0 mqtt@4.2.4 (npm)

The brokers are running on Windows 2012R2 The node client is running on Windows 10

ralight commented 3 years ago

Thank you for the detailed explanation.

What is happening here, is that if per_listener_settings true is set, then a client that does not have a listener associated with it will be denied access (it has no listener, so we don't know which settings to apply). When you stop the broker, it saves the listener port number for each client, so when it restarts it can associate the correct listener with the client. The bug was that it was only doing this if the client was setting a username. So in your case, there is no username, the client doesn't get associated with a listener when the broker restarts, and the broker receives the message before the client has reconnected, so those messages are denied. If you set a username you should notice that the problem disappears. If you're not using authentication, then setting a username won't do anything in the current mosquitto versions.

I've pushed fixes for this to the 1.6.x and develop branches. What is happening here, is that when the broker is stopped with per_listener_settings true, it should saves the listener port that the

dk-teknologisk-mnc commented 3 years ago

I've been scratching my head for a week, trying to figure out if this was a bug, a feature or a problem in my code. I posted this issue three hours ago, left work, jumped on a train and took the bike the last 5km home from the station. When entering my front door 90 minutes later I looked at my phone and you, @ralight had already pushed a fix. Thank you so much for looking into this.

ralight commented 3 years ago

You'll notice that there is a bit of variability in response time for different issues, you've got lucky :)

dk-teknologisk-mnc commented 3 years ago

@ralight One more thing... I forgot to mention it - sorry about that. I'm not sure if this is related to the same issue, but I guess it is:

If I add an ACL file to the mix, the same problem is triggered - even with per_listener_setting set to false. So with the configuration below the queued messages are not delivered either in the scenario described in my intital post:

Broker A configuration

persistence true
per_listener_settings false
acl_file C:\Program Files\mosquitto\acl.txt

# Bridge to broker B
connection broker-a-bridge-client
address broker-b.mydomain.net
cleansession false
topic hello in 1

acl.txt

topic readwrite #
dr-tns commented 2 years ago

@ralight I encountered this issue as well recently, however it started when I'm switched to the 2.0.14 version. When I found this post I changed back to 1.6.15 and now it works again. Is it possible the issue still exist in the 2.0.x branch?