adafruit / Adafruit_CircuitPython_MiniMQTT

MQTT Client Library for CircuitPython
Other
72 stars 50 forks source link

honor recv_timeout in _sock_exact_recv() and ping() #190

Closed vladak closed 1 month ago

vladak commented 7 months ago

This logical change uses recv_timeout as receive timeout in _sock_exact_recv().

Casually tested with CPython using the test case supplied in the bug, more testing will follow.

vladak commented 5 months ago

Added a simple test case.

vladak commented 5 months ago

Tested on QtPy running Adafruit CircuitPython 8.2.6 on 2023-09-12; Adafruit QT Py ESP32-S3 no psram with ESP32S3 with the code from the issue adapted for CP:

#!/usr/bin/env python3

import ssl
import time
import wifi
import socketpool

import adafruit_minimqtt.adafruit_minimqtt as MQTT

import adafruit_logging as logging

from secrets import secrets

def main():
    logger = logging.getLogger(__name__)
    logger.setLevel(logging.DEBUG)

    logger.info(f"Connecting to wifi {secrets['SSID']}")
    wifi.radio.connect(secrets["SSID"], secrets["password"], timeout=20)
    logger.info(f"Connected to {secrets['SSID']}")
    logger.debug(f"IP: {wifi.radio.ipv4_address}")

    pool = socketpool.SocketPool(wifi.radio)

    keep_alive_timeout = 16
    host = "172.40.0.3"
    port = 1883
    mqtt_client = MQTT.MQTT(
        broker=host,
        port=port,
    socket_pool=pool,
        ssl_context=ssl.create_default_context(),
        connect_retries=1,
        recv_timeout=5,
        keep_alive=keep_alive_timeout,
    )

    mqtt_client.logger = logger

    logger.debug(f"connecting with keep alive = {keep_alive_timeout}")
    mqtt_client.connect()

    delay = 2 * keep_alive_timeout
    logger.debug(f"sleeping for {delay} seconds")
    time.sleep(delay)
    logger.debug("pinging the server")
    try:
        mqtt_client.ping()
    except MQTT.MMQTTException as e:
        logger.error(f"ping failed: {e}")

if __name__ == "__main__":
    main()

the output was as follows:

1311.760: INFO - Connecting to wifi XXX
1311.761: INFO - Connected to XXX
1311.763: DEBUG - IP: 172.40.0.23
1311.766: DEBUG - connecting with keep alive = 16
1311.767: DEBUG - Attempting to connect to MQTT broker (attempt #0)
1311.768: DEBUG - Attempting to establish MQTT connection...
1311.770: INFO - Establishing an INSECURE connection to 172.40.0.3:1883
1311.775: DEBUG - Sending CONNECT to broker...
1311.777: DEBUG - Fixed Header: bytearray(b'\x10\x14')
1311.778: DEBUG - Variable Header: bytearray(b'\x00\x04MQTT\x04\x02\x00\x10')
1311.782: DEBUG - Receiving CONNACK packet from broker
1311.785: DEBUG - Got message type: 0x20 pkt: 0x20
1311.787: DEBUG - Resetting reconnect backoff
1311.789: DEBUG - sleeping for 32 seconds
1343.790: DEBUG - pinging the server
1343.791: DEBUG - Sending PINGREQ
1348.794: ERROR - ping failed: PINGRESP not returned from broker within 5 seconds.

On CP it works differently than with CPython - the timeout from _sock_exact_recv() is propagated via _wait_for_msg() as None into the higher level functions which then check the timeout themselves, in this case this is ping(). The important part is that the time difference between the PINREQ sent and the detection of PINGRESP not received matches the receive timeout, not keep alive timeout.

vladak commented 5 months ago

Checking the code after this change for the remaining keep_alive occurrences, they are only in places related to the CONNECT message marshalling and inside the loop() in order to ping the server periodically.

vladak commented 5 months ago

The change in ping() probably deserves some comment. The MQTT 3.1.1 spec in section 2.2 says:

If a client does not receive a PINGRESP message within a Keep Alive time period after sending a PINGREQ, it should close the TCP/IP socket connection.

The current code in ping() merely raises the MMQTTException exception without closing the connection, perhaps with the expectation that the caller will close it. Anyhow, I think the receive timeout should trump the keep alive timeout also in this case - the information about not being able to receive data should get to the caller as soon as possible (assuming receive timeout is usually much smaller than the keep alive timeout).

dhalbert commented 3 months ago

@vladak this now has merge conflicts.

vladak commented 3 months ago

@vladak this now has merge conflicts.

rebased