eclipse / paho.mqtt-sn.embedded-c

Paho C MQTT-SN gateway and libraries for embedded systems. Paho is an Eclipse IoT project.
https://eclipse.org/paho
Other
315 stars 178 forks source link

Message loss for subscribed topics after gateway restart #134

Closed simon-ebner closed 4 years ago

simon-ebner commented 6 years ago

How the MQTT-SN gateway is currently implemented:

Screenshot

Screenshot of gateway log that cancels inflight messages to which a client is subscribed to

Screenshot of gateway log that cancels inflight messages to which a client is subscribed to

What has to be adapted? In my point of view this behaviour is critical in environments where messages mustn't get lost. It's not viable not to forward offline messages identified as MQTT-SN publish message that have been held back for the client while it has been absent. When no topic has been found for such a publish message we could also simply add it to the list of topics for the client. In the case that the client subscribes with a clean session the messages would not be sent by the broker since the previous session gets wiped on the broker side. Before we're able to forward that inflight publish message after registering its long name topic to a short topic id we have to send a register message to the client to make sure that it knows the new mapping from long topic new to the possibly new assigned topic id on that restarted gateway. The gateway has then to wait for the REGACK message from the client as response to the REGISTER command sent to it before it's allowed to publish the message.

Possible bug fix Adapt "MQTTGWPublishHandler.cpp" and remove the following lines: if (topic == nullptr) { WRITELOG(" Invalid Topic. PUBLISH message is canceled.\n"); if (pub.header.bits.qos == 1) { replyACK(client, &pub, PUBACK); } else if ( pub.header.bits.qos == 2 ) { replyACK(client, &pub, PUBREC); }

delete snPacket; return; }

If the topic is not found it is then added to the clients topic list and the logic works as it is the case that the topic has been registered before by the client. topic = client->getTopics()->add(&topicId); id = topic->getTopicId();

That a REGISTER message has been sent according to the protocol specification is denoted here (http://mqtt.org/new/wp-content/uploads/2009/06/MQTT-SN_spec_v1.2.pdf):

i) 6.9 Client’s Topic Subscribe/Un-subscribe Procedure "[...] If the client subscribes to a topic name which contains a wildcard character, the returning SUBACK message will contain the topic id value 0x0000. The GW will the use the registration procedure to inform the client about the to-be-used topic id value when it has the first PUBLISH message with a matching topic name to be sent to the client, see also Section 6.10. [...]"

ii) 6.10 Gateway’s Publish Procedure "[...] Preceding the PUBLISH message the GW may send a REGISTER message to inform the client about the topic name and its assigned topic id value. This will happen for example when the client re-connects without clean session or has subscribed to topic names with wildcard characters. Upon receiving a REGISTER message the client replies with a REGACK message. The GW will wait for the REGACK message before it sends the PUBLISH message to the client. [...]"

Hereby I like to point out the following definition from MQTT-SN Spec 1.2 protocol specification paragraph 6.10: "This will happen for example when the client re-connects without clean session or has subscribed to topic names with wildcard characters."

This improvement would be great.

Simon

ty4tw commented 6 years ago

Hi Simon,

Thank you for your information. 6.10 is implemented.

I’m waiting for your Pull request. Thank for your contribution.

ty4tw commented 6 years ago

135

Thanks for your valuable discussion & contribution as well, Tomoaki!

Assume that you have an infrastructure that consists of the following topology: Client <- MQTT-SN -> MQTT-SN Gateway <- TCP -> MQTT-CLUSTER <- MQTT -> Backend Service

When we talk with QoS 2 from device to the service and vice-versa then we don't want to loose information and data must be delivered exactly once - no matter whether you're using regular topic registration or the concept of pre-defined topics. To even ensure in order transmission one has to send messages with a max inflight size of 1. For ordered topics having available we even don't wan't to mess up this feature.

When there's a MQTT cluster that supports enterprise features like fail-over and message persistence then we also don't want to soften up this requirement as we loose data in the scenario where we simply restart the MQTT-SN gateway in front of the broker.

I think it's worth to investigate such quality goals and to make appropriate contributions.

Could you please briefly tell me what other logic may be missing on the clients side?

For all my tests I directly used the samples from the UDP directory. I also tried to directly communicate via pre-defined topics in a test scenario which didn't work as expected. There's no bare subscriber sample code which directly subscribes to a pre-defined topic. There's just a sample that first registers a long name topics, gets a short topic id, publishes and then receives such a message via subscription.

I would appreciate if one of us could make a proper contribution.

Please keep me updated.

Simone

It’s simple. Launch more than one GW. Clients CONNECT with clean session then SUBSCRIBE topics and wait a PUBLISH message. Client send PINGREQ periodically. If PINGRESP is not returned, this means the GW is dead, Client sends SEARCHGW. After Client receives GWINFO, send CONNECT with clean session flag and SUBSCRIBE again. That’s all.

simon-ebner commented 6 years ago

It's okay having closed my pull request but I'm not satisfied with the current situation. Therefore I kindly ask you to still keep this issue open.

The issue is that QoS 2 messages apparently get lost when the gateway looses its registration mapping due to restart or moving to another gateway.

Cleaning the session is therefore no choice. This would mean that messages gets only delivered when the gateway is currently online and no messages will be held back. MQTT brokers behind the gateway that have features like session replication, message persistence and a working fail-over concept would then be totally useless.

How can we fix that?

ty4tw commented 6 years ago

About your PR, it didn't pass the validation check. eventually it will be closed. So, I copied your your response to this issue and keeping open.

How can we fix that?

Now I don't have any idea. but you can change your own GW''s code the same way as your PR to adopt your circumstance in which clients do not use pre-defined-topicId.

Tomoaki YAMAGUCHI

2018-09-09 0:30 GMT+09:00 Simon Ebner notifications@github.com:

It's okay having closed my pull request but I'm not satisfied with the current situation. Therefore I kindly ask you to still keep this issue open.

The issue is that QoS 2 messages apparently get lost when the gateway looses its registration mapping due to restart or moving to another gateway.

Cleaning the session is therefore no choice. This would mean that messages gets only delivered when the gateway is currently online and no messages will be held back. MQTT brokers behind the gateway that have features like session replication, message persistence and a working fail-over concept would then be totally useless.

How can we fix that?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/eclipse/paho.mqtt-sn.embedded-c/issues/134#issuecomment-419651357, or mute the thread https://github.com/notifications/unsubscribe-auth/AHd9BAHs0HDLnZpCd9mePRECywD1NylDks5uY-KegaJpZM4WcvWC .