eclipse / paho.mqtt.embedded-c

Paho MQTT C client library for embedded systems. Paho is an Eclipse IoT project (https://iot.eclipse.org/)
https://eclipse.org/paho
Other
1.35k stars 752 forks source link

QoS 2 publishing always fails (split of issue #200) #201

Open pirlite2 opened 4 years ago

pirlite2 commented 4 years ago

Hi I have been using Paho C++ via MQTTClient within PlatformIO to produce a NodeMCU-based system to communicate synchronously with a mosquitto broker on a Linux Mint PC.

I can publish from the PC to the NodeMCU at QoS2 no problem. However, I am unable to publish from the NodeMCU board to the PC (mosquitto_sub) at QoS2. Inspecting the mosquitto log, I can see that I am getting a PUBLISH message and the broker is responding with PUBREC. But the PUBREL and PUBCOMP parts of the QoS2 transaction are always absent. It seems like Paho is not producing the PUBREL response and so the transaction dies. Unsurprisingly, no message gets delivered! Publishing from the NodeMCU at QoS0 or QoS1 works – the messages do appear on the PC.

One thing I did notice in MQTTClient.h is that unless MQTTCLIENT_QOS2 is defined (as 2?) then MQTTCLIENT_QOS2 is set to zero. In other words, QoS2 downgrades to QoS0; so it appears that the subsequent code that generates the (missing) PUBREL message is never executed?

Anybody any comments/suggestions on this?

Peter

scaprile commented 4 years ago

OK, to clarify:

Since I'm not familiar with Mosquitto inner workings, I don't know if it is not logging because it is not receiving messages or because it is not receiving proper messages. Wireshark would provide me (a low-level thinking old man) with more clues. Anyway, perhaps defining MQTT_DEBUG will do for now. Can you do it and get some trace out of your device ? Otherwise you should put some breakpoints and follow the QoS2 msg along functions to see where it gets lost or gets trashed. Checking other issues, I remember #196 where I can read "it doesn't support publishing QoS2 messages"... perhaps the necessary callbacks are missing ? QoS0 is send and forget, QoS1 is send and expect, but QoS2 is twice that, so someone needs to stay lingering to receive PUBREC and then send PUBREL and expect PUBCOMP. You've already spotted MQTTCLIENT_QOS2, Unfortunately I can't help with C++, still have not incremented my C after several years of use.

pirlite2 commented 4 years ago

Am posting some examples of the mosquiito log (actually written to syslog for convenience):

I have a Linux box that runs mosquitto and also runs a watchdog that 'pings' a message using the libmosquitto library via localhost and the MQTT broker at QoS2 . The log for a typical watchdog transaction is as follows:

Jan 24 08:58:33 arundel mosquitto[1219]: New connection from 127.0.0.1 on port 1883. Jan 24 08:58:33 arundel mosquitto[1219]: New client connected from 127.0.0.1 as mosq-RGJsQaaknuhBBgvxQa (p2, c1, k60, u'obms-user'). Jan 24 08:58:33 arundel mosquitto[1219]: No will message specified. Jan 24 08:58:33 arundel mosquitto[1219]: Sending CONNACK to mosq-RGJsQaaknuhBBgvxQa (0, 0) Jan 24 08:58:33 arundel mosquitto[1219]: Received SUBSCRIBE from mosq-RGJsQaaknuhBBgvxQa Jan 24 08:58:33 arundel mosquitto[1219]: #011obms/watchdog-test/response (QoS 2) Jan 24 08:58:33 arundel mosquitto[1219]: mosq-RGJsQaaknuhBBgvxQa 2 obms/watchdog-test/response Jan 24 08:58:33 arundel mosquitto[1219]: Sending SUBACK to mosq-RGJsQaaknuhBBgvxQa Jan 24 08:58:33 arundel mosquitto[1219]: Received PUBLISH from mosq-RGJsQaaknuhBBgvxQa (d0, q2, r0, m2, 'obms/watchdog-test/out', ... (112 bytes)) Jan 24 08:58:33 arundel mosquitto[1219]: Sending PUBREC to mosq-RGJsQaaknuhBBgvxQa (m2, rc0) Jan 24 08:58:33 arundel mosquitto[1219]: Received PUBREL from mosq-RGJsQaaknuhBBgvxQa (Mid: 2) Jan 24 08:58:33 arundel mosquitto[1219]: Sending PUBCOMP to mosq-RGJsQaaknuhBBgvxQa (m2)

Note that I get the full suite of PUBLISH-PUBREC-PUBREL-PUBCOMP. These transactions work fine.

A typical log for an interaction with the Paho client running on a NodeMCU and communicating over a wireless access point is:

Jan 24 13:32:46 arundel mosquitto[1219]: New connection from 192.168.0.100 on port 1883. Jan 24 13:32:46 arundel mosquitto[1219]: New client connected from 192.168.0.100 as room1-trv (p1, c1, k60, u'obms-user'). Jan 24 13:32:46 arundel mosquitto[1219]: No will message specified. Jan 24 13:32:46 arundel mosquitto[1219]: Sending CONNACK to room1-trv (0, 0) Jan 24 13:32:46 arundel mosquitto[1219]: Received SUBSCRIBE from room1-trv Jan 24 13:32:46 arundel mosquitto[1219]: #011obms/room1/trv (QoS 2) Jan 24 13:32:46 arundel mosquitto[1219]: room1-trv 2 obms/room1/trv Jan 24 13:32:46 arundel mosquitto[1219]: Sending SUBACK to room1-trv Jan 24 13:32:46 arundel mosquitto[1219]: Received PUBLISH from room1-trv (d0, q2, r0, m2, 'obms/controller', ... (27 bytes)) Jan 24 13:32:46 arundel mosquitto[1219]: Sending PUBREC to room1-trv (m2, rc0) Jan 24 13:32:46 arundel mosquitto[1219]: New connection from 127.0.0.1 on port 1883. Jan 24 13:32:46 arundel mosquitto[1219]: New client connected from 127.0.0.1 as mosq-kXdNmZLvQyikgFFsJT (p2, c1, k60, u'obms-user'). Jan 24 13:32:46 arundel mosquitto[1219]: No will message specified. Jan 24 13:32:46 arundel mosquitto[1219]: Sending CONNACK to mosq-kXdNmZLvQyikgFFsJT (0, 0)

Notice here that i) the transaction seems to be tagged with "q2" which I assume means mosquitto thinks the message is at QoS2, ii) the mosquitto broker sends PUBREC in response to the Paho client's PUBLISH, iii) the transaction never completes! There is no PUBREL received from the Paho client. (The next logged item in this case is a connection to my within-Linux watchdog system.)

Sergio - Does this give you enough information, or shall I try running with MQTT_DEBUG defined?

Unfortunately I can't help with C++, still have not incremented my C after several years of use.

In fact, the C++ client is a thin wrapper around the C library.

Peter

scaprile commented 4 years ago

Seeing the log, I still don't know if missing messages are lost or trashed. However, the second connection seems to indicate either a short timeout or an error in handling. AFAIK the remaining part of the QoS2 conversation should (must?) follow on the very same connection, something is wrong here. Try enabling MQTT_DEBUG to see if that shines some more light. I beg to differ on your appreciation of the client code. As far as I can see, there is C++ code in MQTTClient.h. In fact, I don't see an implementation for QoS2 in the C client while the C++ client definitely has one. Enter grep. The packet handling code for PUBREC starts in line 640 of MQTTClient.h and, for that to be compiled, MQTTCLIENT_QOS2 must be defined. I don't see a specific value required, some checks assume a non zero value and the linux example hello.cpp defines it to 1; so #define MQTTCLIENT_QOS2 1 You should check there for the reason why the connection is restarted, or try to port one of the examples and run just that. If there is an example I must assume the C++ client code is running fine.