Investigate MQTT clustering

drasko commented 6 years ago

Investigate current Mainflux MQTT clustering, especially for QoS 1 and 2.

Example scenario:

Client 1 connects to MQTT broker 1 with QoS 1
Client has intermittent disconnection and tries to connect again
LB redirects it to MQTT broker 2
MQTT broker 2 should deliver retained messages to Client 1

Check if MQTT broker 2 knows that Client 1 was connected and with QoS 1 and that reteined messages are well delivered - i.e. check if shared Redis instance holds all necessary data (session, client ID, ...) and that our way of clustering via NATS can work.

Also, correct the MQTT Client about PUBACK - should arrive not when MQTT broker publishes on empty topic, but when message that comes from NATS is published.

sprijk commented 6 years ago

Why not use https://github.com/erlio/vernemq? Here's the docker image: https://github.com/erlio/docker-vernemq

drasko commented 6 years ago

@sprijk Mainflux MQTT brokers are operating like stateless micoroserivces, and their clustering is obtained via more generic NATS broker. This allows few benefits - first scaling is easy to obtain via simple load balancing, but more importantly we can broker via different protocols. This is what makes Mainflux unique - you can send message via HTTP rest and recieve it via MQTT SUB for example. All 4 protocols are interchangeable.

It is probable that same could be obtained via MQTT cluster connected to NATS via some LB, and in that case VerneMQ (or Aedes) would be very good candidate. Note that VerneMQ is still on our radar, even unclustered, as with authorization plugin it can be connected to NATS and treated the same way as we treat Aedes today.

The goal of this issue (research) is to establish if our current way of MQTT clustering is adhering to MQTT standard and if maybe classical way of MQTT clustering (without NATS, but making VerneMQT cluster in from of NATS) would be more adequate.

windbender commented 6 years ago

MQTT brokers are definitely NOT stateless. A few things I can think of (there may be others, and I may have exact details not quite right ):

1) last will. Each client has a "last will" which is a message which will get published on a topic if the client is disconnected as opposed to gracefully disconnecting. This is per client state.

2) retain=true. This is an attribute associated with a message which causes this message to be "retained" as the last message on a topic, and published to any NEW subscribers to that topic. This is per topic state.

3) QOS=1,2 (at least once, exactly once ) This is either per publisher, per message state, or. per subscriber per message state depending on where in the chain you are looking. Either way, QOS1,2 cannot be implemented in a purely state free fashion, since both imply message storage in the case of subscribed, but not connected clients.

So I would have to say that unless there is more than meets the eye as to how NATS is working, that you cannot meet the MQTT specification with this architecture. What you have may still be quite effective in many use cases, but it would highly suggest you call how which aspects of the MQTT specification are unsupported.

anovakovic01 commented 6 years ago

@windbender He is talking about Mainflux MQTT brokers, as he already wrote. This Mainflux MQTT broker is actually Mainflux MQTT adapter. But yes, MQTT brokers are not stateless. We don't have support for retained messages. What Mainflux MQTT brokers do when message is published, they publish it to NATS. Every other adapter including MQTT adapter will receive this message through NATS. When MQTT adapter receives message from NATS, it publishes it under QoS 2. We use Redis to support QoS 1 and 2.

drasko commented 6 years ago

@windbender MQTT brokers are not stateless, but Mainflux uses different tricks to avoid firehose scaling problems (described here, here or here). Also, instances are not clustered (interconnected) between themselves, but they forward messages to NTAS thich forwards messages to other broker instances, but also to other protocol servers (for example WS server).

Regarding MQTT clients - they can connect, subscribe and publish via QoS 0, 1 or 2 - this is all well supported by Aedes broker (which Mainflux uses as an MQTT adapter library). All QoS levels are already functional in Mainflux because:

Publisher will receive PUBACK and other handshake messages from broker
Although messages from NATS internally are always published with QoS 2, overall QoS is always equal to the lowest QOS of the publish or subscribe - so Subscriber will get message in demanded QoS (http://www.steves-internet-guide.com/understanding-mqtt-qos-2/).

Last will & testament message is provided on connection and so is handled by Aedes in Redis. The only thing we must check is if this last will message is forwarded to NATS, so that it can be propagated to other MQTT broker instances (and also sent to the bridge so that WS and/or CoAP observing clients would get it, but this is less important, as this is pretty MQTT specific). @anovakovic01 please test and confirm this.

The only thing that is missing here is message retention, and retention flag information when message is forwarded to NATS can be kept in RawMessage struct. Other solution is to fix retention to 1 by design, and tell all subscribing clients to ignore first message on initial connection if the do not like retention.

drasko commented 6 years ago

Note that all of this can be solved also by letting MQTT broker cluster before NATS. This will function in the following manner:

MQTT instances in the MQTT cluster publish to NATS prior to publishing to internal cluster (similar as it is now). This publish to NATS is used to forward the messages to other protocols and to the DB, not to forwards messages to other MQTT brokers (this is done through the MQTT cluster)
MQTT instances subscribe to NATS with queue option. Alsi, they iignore all MQTT messages coming from NATS, as this MQTT message passing has already been handled via cluster. they only re-publish NATS messages coming from other protocols, and they can do this with QoS 2 (subscribers will determine overall QoS)
Retention flag for messages coming from MQTT client will be seamlessly handled via cluster, but retention flag for messages coming from other protocols should be passed as a parameter (part of RawMessage struct, as I explained).

This way NATS does not play any role in MQTT clustering (as it is now), but only in protocol bridging and DB storage.

windbender commented 6 years ago

I think you have some interesting solutions to these challenges. I would highly encourage you to keep a careful list of where your solutions do not fully meet the MQTT specification. MQTT is a powerful protocol and many of the features which are beyond a simple PUB/SUB system are there because IOT system often require these feature to operate efficiently.

In particular:

Retain=true is critical to broadcast messages (one publisher, multiple subscribers) to devices which are not connected currently and which have never been connected. Without this, the app level needs to maintain a per client state of who has received messages, and publish them individually once a client has connected. In the case of a single topic with multiple subscribers, this forms a weird situation in which you have to rebroadcast the last message to currently connected clients just to get that message to a newly connected client. Non device specific state on an IOT system is often broadcast like this.

Lastwill very much helps determines which devices are currently connected ( by sending messages, when they disconnect). Again, without this feature, system designed end up implementing their own. PING/PINGACK type protocol to determine when a client has dropped off. (on top of the one in MQTT, which itself is on top of the one in TCP ).

I forgot another additional state. There is per client state kept in an MQTT broker for ALL subscriptions which are QOS 1 or 2. This state can be erased if the client connects with "Clean Session" flag set, but otherwise indicates that a client which reconnects should receive all those messages.

drasko commented 6 years ago

I am currently finishing VerneMQ Mainflux auth plugin. We'll give it a shot, and try VerneMQ native clustering.

Alternatives are to examine Aedes clustering and also to work on mqtt2mqtt proxy that would do the authentication (instead of VerneMQ plugin).

drasko commented 5 years ago

First implementation of VerneMQ available here: https://github.com/drasko/mqtt-erl

absmach / magistrala

Investigate MQTT clustering #314