eclipse / paho.mqtt.python

paho.mqtt.python
Other
2.17k stars 723 forks source link

Confusion about ReasonCode in MQTTv5 #659

Closed mindrunner closed 7 months ago

mindrunner commented 2 years ago

I moved to MQTTv5 and updated the callbacks on_connect and on_disconnect

It seem that on_connect now passes me a ReasonCode object instead of an int. However, on_disconnect still uses int as rc. Is that on purpose or am I missing something here. I skimmed through the client.py code and it looks pretty much intentional.

Reading https://github.com/eclipse/paho.mqtt.python/issues/392, it seems like rc in on_disconnect has no real meaning, but then why do I find DISCONNECT in PacketTypes as well as several ReasonCodes for disconnect?

Most importantly, how do I find out why the connection was disconnected? Am I supposed to use error_string(rc) for that?

fpagliughi commented 1 year ago

The "Reason Code" is a new feature added to the MQTT v5 spec to clarify the result of an operation. It makes error reporting a little more user-friendly, but can also indicate success. The spec says:

A Reason Code is a one byte unsigned value that indicates the result of an operation. Reason Codes less than 0x80 indicate successful completion of an operation. The normal Reason Code for success is 0. Reason Code values of 0x80 or greater indicate failure.

The CONNACK, PUBACK, PUBREC, PUBREL, PUBCOMP, DISCONNECT and AUTH Control Packets have a single Reason Code as part of the Variable Header. The SUBACK and UNSUBACK packets contain a list of one or more Reason Codes in the Payload.

Read more here: https://docs.oasis-open.org/mqtt/mqtt/v5.0/os/mqtt-v5.0-os.html#_Toc3901031

Unfortunately, the term "reason code" was in common use in many MQTT libraries (often shown as a variable rc) to indicate an integer success/failure value for local functions. So there is some confusion between the usage of these two different things.

mindrunner commented 1 year ago

Cool, thanks for the insight. It's been a while since I fell over this. Speaking one more time about the on_disconnect callback. the source docs state:

for MQTT v5.0:
            disconnect_callback(client, userdata, reasonCode, properties)

...

The rc parameter indicates the disconnection state. If
MQTT_ERR_SUCCESS (0), the callback was called in response to
a disconnect() call. If any other value the disconnection
was unexpected, such as might be caused by a network error.

Does that mean, on_disconnect does not make use of the "new" ReasonCode but rather uses the commonly used rc integer success/failure value for local functions?

Jibun-no-Kage commented 1 year ago

Can we get an answer to @mindrunner question above? I too as well don't understand why on_disconnect callback provides an integer RC code, not a string ReasonCode like say on_connect. It is a confusing inconsistency, can't the code be updated to add some reasonable consistency to the other call backs.... or STATE in the SOURCE, this is by design to allow this inconsistency.

jmmk commented 1 year ago

Here are all the inconsistencies I have found with the reason code when using MQTT v5:

  1. As mentioned in #687, on_subscribe and on_unsubscribe use different order for their arguments

                        on_subscribe(
                            self, self._userdata, mid, reasoncodes, properties)
    
                        on_unsubscribe(
                            self, self._userdata, mid, properties, reasoncodes)
  2. on_unsubscribe may pass a list[ReasonCodes] or a single ReasonCodes
            reasoncodes = []
            for c in packet[props_len:]:
                if sys.version_info[0] < 3:
                    c = ord(c)
                reasoncodes.append(ReasonCodes(UNSUBACK >> 4, identifier=c))
            if len(reasoncodes) == 1:
                reasoncodes = reasoncodes[0]
  3. on_connect can pass an int (if the result is 132) or a ReasonCodes
            if result == 1:
                # This is probably a failure from a broker that doesn't support
                # MQTT v5.
                reason = 132 # Unsupported protocol version
                properties = None
            else:
                reason = ReasonCodes(CONNACK >> 4, identifier=result)
                properties = Properties(CONNACK >> 4)
                properties.unpack(self._in_packet['packet'][2:])
  4. on_disconnect can pass an int or a ReasonCodes. In this case, the ints are usually used on success (which is the standard 0 code) or due to client-side errors like keepalive failure or connection lost.

            if self._state == mqtt_cs_disconnecting:
                rc = MQTT_ERR_SUCCESS
            else:
                rc = MQTT_ERR_KEEPALIVE
    
            self._do_on_disconnect(rc)
    # ...
                    self._do_on_disconnect(MQTT_ERR_CONN_LOST)
    # ...
    def _handle_disconnect(self):
        packet_type = DISCONNECT >> 4
        reasonCode = properties = None
        if self._in_packet['remaining_length'] > 2:
            reasonCode = ReasonCodes(packet_type)
    
        self._loop_rc_handle(reasonCode, properties)

For different callbacks, the rc you receive may be an int, a ReasonCodes, or a list[ReasonCodes]. If the given rc is a ReasonCodes, you can use rc.getName() to get the string reason. If it is an int, you can use error_string(rc) to get the string reason.

shailesh-bear commented 9 months ago

I am also facing this issue. Disconnect callback does not work for MQTT 5.0 as expected. It is not called with the correct reason code and properties from the packet. Disconnect callback is not even called in _handle_disconnect, which is when we receive the packet from the server. This method is instead called outside of regular MQTT communication when the client thread fails to receive keep-alive.

MattBrittan commented 8 months ago

Flagging this an an enhancement as it's going to need to be given some thought (resolving some of the inconsistencies will require breaking changes in terms of parameter order).

PierreF commented 7 months ago

687, #656 and #715 seems all related. There is indeed bad consistency in callbacks parameter/signature.

My suggestion (and I'll submit a PR for this) is:

The signature I think of are:

on_connect(Client, userdata, ConnectFlags, ReasonCode, Properties)  # This is called when connack is received
on_connect_fail(Client, userdata)  # This is called when connection failed before connack
on_pre_connect(Client, userdata)  # This is called just before opening the TCP socket
on_disconnect(Client, userdata, DisconnectFlags, ReasonCode, Properties)
on_log(Client, userdata, LogLevel, message)
on_message(Client, userdata, MQTTMessage)
on_publish(Client, userdata, mid, ReasonCode, Properties)
on_subscribe(Client, userdata, mid, list[ReasonCode], Properties)
on_unsubscribe(Client, userdata, mid, list[ReasonCode], Properties)

on_socket_*(Client, userdata, SocketLike)  # unchanged

class ConnectFlags(NamedTuple):
    session_present: bool

class DisconnectFlags(NamedTuple):
    is_disconnect_packet_from_server: bool

For on_disconnect, since there is no direct equivalent for "ERR_CONN_LOST", I'll use "Unspecified error" For on_unsubscribe in MQTTv3, since there is not payload - no reasoncode - we will use an empty list for reasoncodes.

I believe this will provide all information needed for callback (only on_pre_connect & on_connect_fail seems a bit short on information... but I'm not sure how they are used).

It'll also make MQTTv3 & MQTTv5 callbacks similar. Obviously with properties being always None in MQTTv3.

Edit: let's have properties always present (not None). We can always pass an empty properties object which seems better than passing None (user don't need to check for both being not None)