adafruit / Adafruit_CircuitPython_MiniMQTT

MQTT Client Library for CircuitPython
Other
72 stars 50 forks source link

maintain time.monotonic precision by using adafruit_ticks #210

Closed kevin-tritz closed 2 days ago

kevin-tritz commented 3 months ago

The code originally loses timing precision even using time.monotonic_ns because it stores timestamps after using floating point division. If the timestamps are kept as ns integers, and floating point division is only done after subtracting from the current monotonic_ns time, precision can be maintained.

kevin-tritz commented 3 months ago

An alternative to this is to use the adafruit_ticks library, which counts in msecs, for intervals up to about 6 days.

I originally made the change using the adafruit_ticks library, and it seemed to work just fine. But I believe Justin suggested not including an additional dependency since they would want this library to work with the esp32spi, which is frozen: https://discordapp.com/channels/327254708534116352/633326631460798464/1220549312803504299

dhalbert commented 3 months ago

I would not worry a lot about the new dependency: we can freeze adafruit_ticks if needed.

There are only a few boards that don't have monotonic_ns: basically boards that don't have longints: SAMD21 boards without external flash, and a few STM boards with low flash. I don't think these boards are used for MQTT with such low RAM anyway: I'm not sure even ESP32SPI would fit on them.

kevin-tritz commented 3 months ago

Test build environment needs adafruit_ticks?

kevin-tritz commented 3 months ago

build fails because the test is modifying a private variable using the otherwise unused get_monotonic_time function. Either the build needs to be modified appropriately, or get_monotonic_time needs to be modified with backwards-incompatible changes.

vladak commented 3 months ago

build fails because the test is modifying a private variable using the otherwise unused get_monotonic_time function. Either the build needs to be modified appropriately, or get_monotonic_time needs to be modified with backwards-incompatible changes.

I'd say the function needs to be removed and the test adapted per the changes in the module.

kevin-tritz commented 2 months ago

Ok, I made a couple of modifications to the test_loop file, one which is likely fine, one which I'm not sure if it's ok or not. The .loop function was returning a list of length 3 instead of 2, and I'm not sure if that's just some slight differences in timing, or it's really a failed check.

After those mods, the tests passed, but the build is complaining about adafruit_ticks:

File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/adafruit_ticks.py", line 31, in _TICKS_MAX = const(_TICKS_PERIOD - 1)


TypeError: unsupported operand type(s) for -: 'const' and 'int'
vladak commented 2 months ago

Ok, I made a couple of modifications to the test_loop file, one which is likely fine, one which I'm not sure if it's ok or not. The .loop function was returning a list of length 3 instead of 2, and I'm not sure if that's just some slight differences in timing, or it's really a failed check.

After those mods, the tests passed, but the build is complaining about adafruit_ticks:

File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/adafruit_ticks.py", line 31, in _TICKS_MAX = const(_TICKS_PERIOD - 1) ~~~~^ TypeError: unsupported operand type(s) for -: 'const' and 'int'

_TICKS_PERIOD is const (imported from micropython) and has not changed in 3 years (https://github.com/adafruit/Adafruit_CircuitPython_Ticks/blame/527d90e91fa70d08399291229bdcb2e9a6605a6a/adafruit_ticks.py#L30-L32) so something else must have changed, possibly in Python/CircuitPython itself or the micropython module. It works for me on Python3 shipped with Ubuntu 22.04.04 LTS:

$ which python3
/usr/bin/python3
$ python3
Python 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> 
>>> from micropython import const
>>> foo = const(42)
>>> bar = const(foo - 1)
>>> 

Anyhow, not related to your changes (modulo pulling the adafruit_ticks anew) as far as I can tell.

Also, I am not actually sure where does the micropython module comes from, it is not listed in pip list. Peeking around (with strace), I can see it is located in ~/.local/lib/python3.10/site-packages/micropython.py and comes from Adafruit. Therein the const is defined simply as:

def const(x):
    "Emulate making a constant"
    return x
vladak commented 2 months ago

The micropython.py comes from Adafruit_Blinka: https://github.com/adafruit/Adafruit_Blinka/blob/b3beba7399bc4d5faa203ef705691ef79fc4e87f/src/micropython.py#L12-L14 and is still the same as above so perhaps some more strict enforcement was introduced in CPython itself.

dhalbert commented 2 months ago

@vladak: @jepler discussed this with @kevin-tritz in Discord. The problem I think was that micropython was mocked in the documentation build, so a correct implementation of const() was not available. This was indeed confusing.

vladak commented 2 months ago

Ok, I made a couple of modifications to the test_loop file, one which is likely fine, one which I'm not sure if it's ok or not. The .loop function was returning a list of length 3 instead of 2, and I'm not sure if that's just some slight differences in timing, or it's really a failed check.

I will take a look.

vladak commented 1 month ago

Looks good.

vladak commented 1 month ago

Ideally, this should be tested on a microcontroller. I may give it a go on my QtPy.

vladak commented 1 month ago

I have tested this a bit on my new Adafruit CircuitPython 9.0.4 on 2024-04-16; Adafruit QT Py ESP32-S3 4MB Flash 2MB PSRAM with ESP32S3 with the following code:

#!/usr/bin/env python3

import adafruit_logging as logging
import random
import socketpool
import ssl
import sys
import time
import wifi

from secrets import secrets

import adafruit_minimqtt.adafruit_minimqtt as MQTT

def on_message(client, topic, msg):
    #logger = logging.getLogger(__name__)
    #logger.info(f"Got msg on '{topic}' ({len(msg)} bytes)")
    client.user_data[0] += 1
    client.user_data[1] += len(msg)

def main():
    logger = logging.getLogger(__name__)
    logger.setLevel(logging.DEBUG)

    logger.info("Connecting to wifi")
    wifi.radio.connect(secrets["ssid"], secrets["password"], timeout=10)
    logger.debug(f"IP: {wifi.radio.ipv4_address}")

    pool = socketpool.SocketPool(wifi.radio)

    host = "test.mosquitto.org"
    port = 1883
    stats = [0, 0]
    mqtt_client = MQTT.MQTT(
        broker=host,
        port=port,
        socket_pool=pool,
        ssl_context=ssl.create_default_context(),
        connect_retries=1,
        recv_timeout=5,
        use_binary_mode=True,        # test.mosquitto.org has messages with UnicodeError
        user_data=stats,
    )

    # mqtt_client.logger = logger

    mqtt_client.on_message = on_message

    logger.debug(f"connecting to MQTT broker {host}")
    mqtt_client.connect()

    topic = "#"
    logger.debug(f"subscribing to {topic}")
    mqtt_client.subscribe(topic)

    while True:
        try:
            mqtt_client.loop(1)
        except MQTT.MMQTTException as e:
            logger.error(f"loop failed: {e}")
            return

        logger.info(f"Messages: {stats[0]}, bytes: {stats[1]}")

if __name__ == "__main__":
    try:
        main()
    except KeyboardInterrupt:
        sys.exit(0)

and it ended up with error after 2 minutes:

2585.077: INFO - Connecting to wifi
2590.368: DEBUG - IP: 172.40.0.30
2590.377: DEBUG - connecting to MQTT broker test.mosquitto.org
2593.231: DEBUG - subscribing to #
2594.559: INFO - Messages: 204, bytes: 12962
2595.565: INFO - Messages: 314, bytes: 52876
2596.569: INFO - Messages: 430, bytes: 93282
2597.591: INFO - Messages: 543, bytes: 134021
2598.614: INFO - Messages: 663, bytes: 174095
2599.620: INFO - Messages: 791, bytes: 208341
2600.625: INFO - Messages: 1043, bytes: 209800
2601.630: INFO - Messages: 1284, bytes: 213782
2602.633: INFO - Messages: 1521, bytes: 227964
2603.637: INFO - Messages: 1753, bytes: 248440
2604.668: INFO - Messages: 1952, bytes: 269726
2605.679: INFO - Messages: 2190, bytes: 287693
2606.686: INFO - Messages: 2398, bytes: 315363
2607.694: INFO - Messages: 2645, bytes: 330242
2608.704: INFO - Messages: 2865, bytes: 342094
2609.708: INFO - Messages: 3083, bytes: 356324
2610.714: INFO - Messages: 3299, bytes: 375794
2611.721: INFO - Messages: 3511, bytes: 401226
2612.725: INFO - Messages: 3725, bytes: 420099
2613.729: INFO - Messages: 3976, bytes: 428784
2614.770: INFO - Messages: 4212, bytes: 440797
2615.773: INFO - Messages: 4448, bytes: 454103
2616.783: INFO - Messages: 4667, bytes: 467514
2617.789: INFO - Messages: 4853, bytes: 507280
2618.798: INFO - Messages: 5081, bytes: 530357
2619.806: INFO - Messages: 5328, bytes: 541211
2620.811: INFO - Messages: 5554, bytes: 566529
2621.817: INFO - Messages: 5798, bytes: 579650
2622.829: INFO - Messages: 6038, bytes: 596174
2623.834: INFO - Messages: 6273, bytes: 610909
2624.903: INFO - Messages: 6514, bytes: 635399
2625.923: INFO - Messages: 6748, bytes: 653713
2626.926: INFO - Messages: 6972, bytes: 671921
2627.930: INFO - Messages: 7205, bytes: 690661
2628.939: INFO - Messages: 7431, bytes: 710925
2629.946: INFO - Messages: 7662, bytes: 730690
2630.951: INFO - Messages: 7917, bytes: 741416
2631.957: INFO - Messages: 8126, bytes: 769322
2632.961: INFO - Messages: 8360, bytes: 789694
2633.969: INFO - Messages: 8604, bytes: 809029
2634.974: INFO - Messages: 8823, bytes: 827086
2636.979: INFO - Messages: 8856, bytes: 969083
2637.983: INFO - Messages: 9101, bytes: 980437
2638.990: INFO - Messages: 9317, bytes: 1005100
2639.997: INFO - Messages: 9549, bytes: 1021483
2641.000: INFO - Messages: 9765, bytes: 1038507
2642.006: INFO - Messages: 10010, bytes: 1055067
2643.012: INFO - Messages: 10257, bytes: 1076529
2644.018: INFO - Messages: 10477, bytes: 1102363
2645.024: INFO - Messages: 10692, bytes: 1123337
2646.031: INFO - Messages: 10934, bytes: 1138542
2647.040: INFO - Messages: 11150, bytes: 1157016
2648.044: INFO - Messages: 11377, bytes: 1175636
2649.187: INFO - Messages: 11604, bytes: 1215308
2650.197: INFO - Messages: 11839, bytes: 1236870
2651.206: INFO - Messages: 12060, bytes: 1260244
2652.212: INFO - Messages: 12272, bytes: 1282616
2653.219: INFO - Messages: 12483, bytes: 1309927
2713.240: ERROR - loop failed: PINGRESP not returned from broker.

This is rather strange and should be investigated, however this also happens with vanilla 7.6.3, i.e. before the changes.

vladak commented 1 month ago

I reduced the subscription to use the devices/# topic and it works fine. This wildcard topic receives very little traffic compared to the # topic. What happens in that case is due to big number of PUBLISH messages the client never gets to actually receive the PINGRESP within the expected time frame. Not sure if the server prioritizes these over PUBLISH messages. Also, the QtPy is kind of slow to receive which is probably a contributing factor. I believe the PINGRESP will eventually come, however my testing with the MMQTTException in ping() replaced with logger stopped with ETIMEDOUT raised from _sock_exact_recv() after some time - maybe the broker has disconnected the client. Anyway, this is nothing that would block the change in this PR.

kevin-tritz commented 1 month ago

Added your suggested changes: (ticks_ms() / 1000 > ping_timeout). I assumed the MQTT Exception needed an f-string.

vladak commented 1 month ago

Added your suggested changes: (ticks_ms() / 1000 > ping_timeout). I assumed the MQTT Exception needed an f-string.

Cool, looks good. Indeed, f-string is needed.

dhalbert commented 1 month ago

@vladak We appreciate your testing. Do you think this is in good shape now?

@kevin-tritz Note there is now a merge conflict to resolve.

vladak commented 1 month ago

@vladak We appreciate your testing. Do you think this is in good shape now?

yep, looks good to me.

kevin-tritz commented 1 month ago

There is now a minor test failure. See https://github.com/adafruit/Adafruit_CircuitPython_MiniMQTT/actions/runs/9370198935/job/25796562379?pr=210#step:2:805, starting at line 769. The timeout test is off by a bit (the < 8 comparison).

Yeah, saw that. Might be due to the differences that showed up in the merge conflict. I'll take a look in a bit.

FoamyGuy commented 2 days ago

@dhalbert Yep, that sounds good to me. I've committed that change.