arduino / arduino-iot-cloud-py

Arduino IoT Cloud Python Client.
Mozilla Public License 2.0
29 stars 4 forks source link

Script hangs after 12 hours with 1-second refresh #72

Closed Bodobolero closed 11 months ago

Bodobolero commented 1 year ago

I am running the following script on Arduino RP2040 Connect with Arduino IoT Cloud. The script reads temperature and humidity from DHT20 over I2C using https://files.seeedstudio.com/wiki/Grove-Temperature-Humidity-Sensor/Pico-micropython-master.zip dht20.py.

The script runs fine for about 12 hours (about 40.000 loop iterations) and then gets stuck. I suspect this is due to heap fragmentation in ArduinoIoTCloud python module or other module it uses.

It would be helpful to add a watchdog like in the C implementation that reboots the board in case of becoming unresponsive.

Here is my script to repro the problem

from machine import I2C, Pin
from dht20 import DHT20
import time
import network
import logging
from arduino_iot_cloud import ArduinoCloudClient

from secrets import WIFI_SSID
from secrets import WIFI_PASSWORD
from secrets import DEVICE_ID
from secrets import CLOUD_PASSWORD

led = Pin("LED", Pin.OUT)  # Configure the desired LED pin as an output.
i2c = I2C(0, scl=Pin(13), sda=Pin(12))
dht20 = DHT20(i2c)

def on_switch_changed(client, value):
    # Toggles the hardware LED on or off.
    led.value(value)

    # Sets the value of the cloud variable "led" to the current state of the LED
    # and thus mirrors the hardware state in the cloud.
    client["led"] = value

def read_temperature(client):
    global dht20
    temperature = dht20.dht20_temperature()
    logging.info(f"DHT20: {temperature} C")
    return temperature

def read_humidity(client):
    global dht20
    humidity = dht20.dht20_humidity()
    logging.info(f"DHT20: {humidity} %")
    return humidity

def wifi_connect():
    if not WIFI_SSID or not WIFI_PASSWORD:
        raise (
            Exception("Network is not configured. Set SSID and passwords in secrets.py"))
    wlan = network.WLAN(network.STA_IF)
    wlan.active(True)
    wlan.connect(WIFI_SSID, WIFI_PASSWORD)
    while not wlan.isconnected():
        logging.info("Trying to connect. Note this may take a while...")
        time.sleep_ms(500)
    logging.info(f"WiFi Connected {wlan.ifconfig()}")

if __name__ == "__main__":

    # Configure the logger.
    # All message equal or higher to the logger level are printed.
    # To see more debugging messages, set level=logging.DEBUG.
    logging.basicConfig(
        datefmt="%H:%M:%S",
        format="%(asctime)s.%(msecs)03d %(message)s",
        level=logging.INFO,
    )

    logging.info(f"Connect to WiFi")

    # NOTE: Add networking code here or in boot.py
    wifi_connect()

    # Create a client object to connect to the Arduino IoT cloud.
    # For MicroPython, the key and cert files must be stored in DER format on the filesystem.
    # Alternatively, a username and password can be used to authenticate:
    client = ArduinoCloudClient(
        device_id=DEVICE_ID, username=DEVICE_ID, password=CLOUD_PASSWORD)

    # Register cloud objects.
    # Note: The following objects must be created first in the dashboard and linked to the device.
    # This cloud object is initialized with its last known value from the cloud. When this object is updated
    # from the dashboard, the on_switch_changed function is called with the client object and the new value.
    client.register("ledSwitch", value=None,
                    on_write=on_switch_changed, interval=0.250)

    # This cloud object is updated manually in the switch's on_write_change callback to update the LED state in the cloud.
    client.register("led", value=None)

    # read only variables temperature and humidity
    client.register("temperature", value=None, on_read=read_temperature, interval=1.0)
    client.register("humidity", value=None, on_read=read_humidity, interval=1.0)

    logging.info(f"starting IoT client loop")

    # Start the Arduino IoT cloud client.
    client.start()

Maybe the problem is in my script and not in the arduino-iot-cloud-py implementation - however so far I did not succeed in deploying a Arduino IoT cloud project in production in Micropython while I have several Arduino IoT cloud "things" successfully deployed already using the Arduino Cloud C/C++ libraries.

iabdalkader commented 1 year ago

This is hard to debug, it could be anything, but if you suspect memory issue you can disable logging, but also logging supports logging to a file, so you can enable that and maybe the log will have useful information.

It would be helpful to add a watchdog like in the C implementation

You can just add one to your script, and call wdt.feed() from a user task that runs every n seconds:

from machine import WDT

wdt = None

def user_task(client):
   global wdt
   wdt.feed()

if __name__ == "__main__":
     ...
    client.register(Task("user_task", on_run=user_task, interval=4.0))

    # enable the WDT with a timeout of 5s (1s is the minimum)
    global wdt
    wdt = WDT(timeout=5000)

    # Start the Arduino IoT cloud client.
    client.start()
Bodobolero commented 1 year ago

Thanks for the suggestion with the machine.WDT watchdog. I will try to see if this manages to restart the board when it gets stuck. I have also removed the logging in the read_xxx callback functions to reduce heap allocations. I will let you know if this resolves my issues. Thanks for now

iabdalkader commented 1 year ago

I have also removed the logging in the read_xxx callback functions to reduce heap allocations.

You can also just raise the level to ERROR or CRITICAL should not produce any output.

Bodobolero commented 1 year ago

I have also removed the logging in the read_xxx callback functions to reduce heap allocations.

You can also just raise the level to ERROR or CRITICAL should not produce any output.

wouldn’t python invoke the logging.info() function anyway to check if info is >= ERROR and thus construct the f-string which would still allocate from the heap, regardless the output is then thrown away?

anglerfish27 commented 1 year ago

I have had essentially a very similar issue to the OP. I bought an RP2040NANO Connect with the latest firmware running on it. I am using MP. I bought this MCU specially to try out Adafruit's IoT cloud solution. After alot of stumbling as the MP documentation was terrible (I see they are finally making changes to it) I was able to get it to connect. I kept things simple. Just 1 I2C device reporting on Temp/Humidity/Pressure (SHT45 module way better than DHTs!). It was working great.

I proceeded on my IoT journey and started messing with ESP32 chips because of the amount of space/heap which has always killed me about the RP2040. 256kram is not a ot especially for things like sockets and images. I ended up purchasing the monthly plan for Arduino's IoT cloud that's how much hope I had in it. I got a mix of ESP32's online with various sensors without issues after figuring out my own "procedure" if you will. They have all kinda of SPI and I2C sensors attached. I checked my dashboards quite often at first then less as time went on. Now when I check, the ONLY device that is offline...the RP2040NANO connect! I have to reconnect it, most of the time it times out on the the NTP section of the code which I have increased the timeout and server pool name to increases chances of success. No luck, I may re-write that whole part. Regardless of the NTP setting even if it times out it connects to the IoT cloud and begins to display my SHT45 sensor data every 10 seconds as I told it to.

I find it interesting the RP2040 keeps on dropping. It must be the WiFi hardware and or the IoT cloud. The ESP32's have been solid. So you're not alone even if you are using Arduino and I'm not for my language.

Eventually I got tired of the very poor interface of the Arduino IoT cloud, the limited widgets and simple like display. Maybe its different with Arduino code, but with MP its pretty weak. I started to learn more about MQTT on my own.

I made the switch to an MQTT broker and data visualizer installation on my local network on a Raspberry Pi4B I had not doing much (any computer will work). I loaded EMQX the MQTT broker, the MQTTX client (to essentially create and test fake devices VERY HANDY), and I used Prometheus and Grafana also installed on the same machine to build my own IoT cloud. Grafana has a million templates and widget styles you can import. You can get some very AWESOME graphics for your sensor data (just google images of Grafana). So I'm cancelling my Arduino subscription. I'll keep the "free" version for testing or something I dont know yet.

I'm not some super advanced webserver setup guy. I literally just followed a few guides on the web (mostly from piupmylife.com) and a little be of reading on Grafana's documentation. I have full and total control. I can create any types of rules alerting emails SMS you name it. With just some clicking around. No coding. Nothing. If you are remotely tech savy which you must be if you're programming MCU's. I recommend this route for your IoT cloud. Should you decide to go outside your network I believe Grafana has a paid service to give you a "domain name" and secure authentication to your dashboards over https. I think. If not you can just port forward it on your router. Its a username/password protected site that can be setup for https. Block the server off from talking to critical parts of your network or make a DMZ and bam get your dashboards on the go! I think not sure but I think Grafana has a mobile device app too.

Ditch the unstable IoT cloud from Arduino. Build your own. For free, forever, no restrictions.

iabdalkader commented 1 year ago

wouldn’t python invoke the logging.info() function anyway

Yes you're right, changing the level will just produce no output. If that is the issue though removing those logs from your callback won't help much as the client still logs a message on every MQTT message and some other places. If it turns out to be the issue, unlikely, I can reduce or gate those log statements somehow.

Bodobolero commented 1 year ago

I have had essentially a very similar issue to the OP. I bought an RP2040NANO Connect with the latest firmware running on it. I am using MP. I bought this MCU specially to try out Adafruit's IoT cloud solution. After alot of stumbling as the MP documentation was terrible (I see they are finally making changes to it) I was able to get it to connect. I kept things simple. Just 1 I2C device reporting on Temp/Humidity/Pressure (SHT45 module way better than DHTs!). It was working great.

I proceeded on my IoT journey and started messing with ESP32 chips because of the amount of space/heap which has always killed me about the RP2040. 256kram is not a ot especially for things like sockets and images. I ended up purchasing the monthly plan for Arduino's IoT cloud that's how much hope I had in it. I got a mix of ESP32's online with various sensors without issues after figuring out my own "procedure" if you will. They have all kinda of SPI and I2C sensors attached. I checked my dashboards quite often at first then less as time went on. Now when I check, the ONLY device that is offline...the RP2040NANO connect! I have to reconnect it, most of the time it times out on the the NTP section of the code which I have increased the timeout and server pool name to increases chances of success. No luck, I may re-write that whole part. Regardless of the NTP setting even if it times out it connects to the IoT cloud and begins to display my SHT45 sensor data every 10 seconds as I told it to.

I find it interesting the RP2040 keeps on dropping. It must be the WiFi hardware and or the IoT cloud. The ESP32's have been solid. So you're not alone even if you are using Arduino and I'm not for my language.

Eventually I got tired of the very poor interface of the Arduino IoT cloud, the limited widgets and simple like display. Maybe its different with Arduino code, but with MP its pretty weak. I started to learn more about MQTT on my own.

I made the switch to an MQTT broker and data visualizer installation on my local network on a Raspberry Pi4B I had not doing much (any computer will work). I loaded EMQX the MQTT broker, the MQTTX client (to essentially create and test fake devices VERY HANDY), and I used Prometheus and Grafana also installed on the same machine to build my own IoT cloud. Grafana has a million templates and widget styles you can import. You can get some very AWESOME graphics for your sensor data (just google images of Grafana). So I'm cancelling my Arduino subscription. I'll keep the "free" version for testing or something I dont know yet.

I'm not some super advanced webserver setup guy. I literally just followed a few guides on the web (mostly from piupmylife.com) and a little be of reading on Grafana's documentation. I have full and total control. I can create any types of rules alerting emails SMS you name it. With just some clicking around. No coding. Nothing. If you are remotely tech savy which you must be if you're programming MCU's. I recommend this route for your IoT cloud. Should you decide to go outside your network I believe Grafana has a paid service to give you a "domain name" and secure authentication to your dashboards over https. I think. If not you can just port forward it on your router. Its a username/password protected site that can be setup for https. Block the server off from talking to critical parts of your network or make a DMZ and bam get your dashboards on the go! I think not sure but I think Grafana has a mobile device app too.

Ditch the unstable IoT cloud from Arduino. Build your own. For free, forever, no restrictions.

Thank you for your advice. However setting up a secure tunnel to my home network is doable but too involved for many people. If you want to measure/control from your smartphone from whereever you are Arduino IoT is a good solution, AFAIK. I personally have deployed web servers at home using cloudflare tunnel to make them accessible from outside my network, but this is not for everyone as you need custom domain etc. I am looking for a simple solution that I can teach in my Arduino classes and Arduino IoT cloud is good for that purpose.

Note, that I have successful projects combining Arduino RP2040 Connect with Arduino IoT cloud running for months without disconnect implemented in C. So I doubt that the board HW or the Arduino IoT Cloud implementation is the problem.

Bodobolero commented 1 year ago

@iabdalkader I have a suggestion for logging in the library:

See https://docs.python.org/3/howto/logging.html section "Optimization"

if logger.isEnabledFor(logging.DEBUG):
    logger.debug('Message with %s, %s', expensive_func1(),
                                        expensive_func2())

Formatting of message arguments is deferred until it cannot be avoided. However, computing the arguments passed to the logging method can also be expensive, and you may want to avoid doing it if the logger will just throw away your event.

This is not important for logging in the initialisation and setup functions, however for those functions which run in each cloud interaction it makes sense, for example here

https://github.com/arduino/arduino-iot-cloud-py/blob/37f7644798dade1dcd74f1adb0bd2ae2d0f1db59/src/arduino_iot_cloud/ucloud.py#L343

iabdalkader commented 1 year ago

Yes you have a point, I will definitely improve that. However I'm not sure it's the GC/logging issue, it could be the Nina WiFi module's driver/firmware leaking memory. It would be very helpful if you could try to capture a log file when the issue happens, I could try the same here hopefully I can reproduce the issue without the attached sensors .

Bodobolero commented 1 year ago

B.t.w. I am using firmware version 1.5.0 of the Nina Wifi firmware (recently updated using the Arduino 2.x IDE firmware updater).

And I repeat, I have the Arduino IoT cloud successfully running using C code. And the C code also uses the same Nina Wifi firmware as far as I know.

Bodobolero commented 1 year ago

Yes you have a point, I will definitely improve that. However I'm not sure it's the GC/logging issue, it could be the Nina WiFi module's driver/firmware leaking memory. It would be very helpful if you could try to capture a log file when the issue happens, I could try the same here hopefully I can reproduce the issue without the attached sensors .

Can you give me advice on how to catch an exception/error in the log if the python runtime just stops executing or is stuck due to out of memory situation? I am still learning micropython and have not much experience on resolving issues with Python in an embedded environment

Bodobolero commented 1 year ago

I just did a test where I stopped my guest Wifi (which is used for my microcontrollers) and restarted it after some time. The RP2040 re-connected to the IoT cloud by itself after the outage, so temporary outage on the Wifi does not seem to be a problem (the other, C++ based implementations re-connected successfully, too).

Bodobolero commented 1 year ago

The new version survived more than 12 hours. I will continue to run it over the weekend and will attach the new version here on Monday if it is still running and also close the issue if it is successful.

iabdalkader commented 1 year ago

And I repeat, I have the Arduino IoT cloud successfully running using C code. And the C code also uses the same Nina Wifi firmware as far as I know.

That's interesting, so with the Arduino C++ library it survives the 12 hours test ? If so, it might be the MicroPython driver side or memory as you mentioned.

Can you give me advice on how to catch an exception/error in the log if the python runtime just stops executing or is stuck due to out of memory situation? I

You can just use try..except clause but if does run out of memory, the exception might be raised anywhere not just by log statements, also the exception code itself might run out of memory but there's usually an emergency buffer for that not sure though.

The RP2040 re-connected to the IoT cloud by itself after the outage

Yes it should have some tolerance for network reliability

The new version survived more than 12 hours. I will continue to run it over the weekend and will attach the new version here on Monday if it is still running and also close the issue if it is successful.

By the new version I assume after commenting out the log statements ? Either way I will add logger.isEnabledFor everywhere as you suggested above on Monday.

Bodobolero commented 1 year ago

By the new version I assume after commenting out the log statements ?

Commenting out log statements and adding the machine.WDT watchdog

With Arduino C++ library I have RP2040 Connect microcontroller running with Arduino IoT Cloud several months without getting lost/stuck.

If I hit the problem again with the new version I will add try/except and log to filesystem to troubleshoot the root cause. I will also periodically, every hour or so, print the memory utilisation to the file system using functions like https://docs.micropython.org/en/latest/library/micropython.html#micropython.mem_info

But let's first run the existing version over the long weekend to see if it still gets "stuck".

Bodobolero commented 1 year ago

The current version stopped working after several hours.

This is the current version still having trouble, but without logging and with the watchdog activated:

from machine import I2C, Pin, WDT
from dht20 import DHT20
import time
import network
import logging
from arduino_iot_cloud import ArduinoCloudClient, Task

from secrets import WIFI_SSID
from secrets import WIFI_PASSWORD
from secrets import DEVICE_ID
from secrets import CLOUD_PASSWORD

led = Pin("LED", Pin.OUT)  # Configure the desired LED pin as an output.

# connect DHT20 sensor to I2C grove connector
i2c = I2C(0, scl=Pin(13), sda=Pin(12))
dht20 = DHT20(i2c)

# keep the watchdog alive every 4 seconds
def watchdog_keepalive(client):
   global wdt
   wdt.feed()
   logging.info(".")

def on_switch_changed(client, value):
    # Toggles the hardware LED on or off.
    led.value(value)

    # Sets the value of the cloud variable "led" to the current state of the LED
    # and thus mirrors the hardware state in the cloud.
    client["led"] = value

def read_temperature(client):
    global dht20
    temperature = dht20.dht20_temperature()
    # following line commented because String handling may cause heap fragmentation
    # logging.info(f"DHT20: {temperature} C")
    return temperature

def read_humidity(client):
    global dht20
    humidity = dht20.dht20_humidity()
    # following line commented because String handling may cause heap fragmentation
    # logging.info(f"DHT20: {humidity} %")
    return humidity

def wifi_connect():
    if not WIFI_SSID or not WIFI_PASSWORD:
        raise (
            Exception("Network is not configured. Set SSID and passwords in secrets.py"))
    wlan = network.WLAN(network.STA_IF)
    wlan.active(True)
    wlan.connect(WIFI_SSID, WIFI_PASSWORD)
    while not wlan.isconnected():
        logging.info("Trying to connect. Note this may take a while...")
        time.sleep_ms(500)
    logging.info(f"WiFi Connected {wlan.ifconfig()}")

# Configure the logger.
# All message equal or higher to the logger level are printed.
# To see more debugging messages, set level=logging.DEBUG.
logging.basicConfig(
    datefmt="%H:%M:%S",
    format="%(asctime)s.%(msecs)03d %(message)s",
    level=logging.INFO,
)

time.sleep_ms(100)

logging.info(f"Connect to WiFi")

# NOTE: Add networking code here or in boot.py
wifi_connect()

# Create a client object to connect to the Arduino IoT cloud.
# For MicroPython, the key and cert files must be stored in DER format on the filesystem.
# Alternatively, a username and password can be used to authenticate:
client = ArduinoCloudClient(
    device_id=DEVICE_ID, username=DEVICE_ID, password=CLOUD_PASSWORD)

# Register cloud objects.
# Note: The following objects must be created first in the dashboard and linked to the device.
# This cloud object is initialized with its last known value from the cloud. When this object is updated
# from the dashboard, the on_switch_changed function is called with the client object and the new value.
client.register("ledSwitch", value=None,
                on_write=on_switch_changed, interval=0.250)

# This cloud object is updated manually in the switch's on_write_change callback to update the LED state in the cloud.
client.register("led", value=None)

# read only variables temperature and humidity
client.register("temperature", value=None, on_read=read_temperature, interval=1.0)
client.register("humidity", value=None, on_read=read_humidity, interval=1.0)

# enable the WDT watchdog with a timeout of 5s (1s is the minimum)
wdt = WDT(timeout=5000)
# Register watchdog.feed() as a user task that is called every 4 seconds.
client.register(Task("user_task", on_run=watchdog_keepalive, interval=4.0))

logging.info(f"starting IoT client loop")

# Start the Arduino IoT cloud client.
client.start()

wdt.feed()
iabdalkader commented 1 year ago

Hi, I did some testing and it's very hard to reproduce this issue, but I think I was able to at least a couple of times, however I don't think this is a memory/GC issue, I will still optimize the logging memory usage, but I think it's an issue in the driver. Can you give the attached firmware a try and let me know if it fixes the problem ? If so I will push the fix upstream. If it still doesn't fix the problem, I will get a DHT20 sensor to run the exact same script and increase the chances of reproducing the issue.

firmware.uf2.zip

Bodobolero commented 1 year ago

Hi @iabdalkader

I have now deployed your firmware 'MicroPython v1.22.0-preview.51.g91a3f1839.dirty on 2023-10-23; Arduino Nano RP2040 Connect with RP2040' and I also use the ArduinoIoTCloud python module from https://github.com/arduino/arduino-iot-cloud-py/tree/logging_memory

and will re-run my tests with that combination. Keep our fingers crossed.

Thanks for your excellent support!

iabdalkader commented 1 year ago

and I also use the ArduinoIoTCloud python module from

Actually you should test this new firmware with the exact same script + library you used before, to isolate the issue.

Bodobolero commented 1 year ago

and I also use the ArduinoIoTCloud python module from

Actually you should test this new firmware with the exact same script + library you used before, to isolate the issue.

Since my ultimate goal is to get a working solution I will first test with both fixes combined. If it runs successfully I can then help bi-sect what was the root cause by again testing with "old" Arduino library version.

iabdalkader commented 1 year ago

Okay, my only concern is if it is/was a memory issue, gating the log statements might just mask it and make it undetectable for a longer time, but either way is fine by me I will push the fixes upstream because it's needed regardless.

iabdalkader commented 1 year ago

Note I've added WDT usage to the MicroPython example here: https://github.com/arduino/arduino-iot-cloud-py/pull/74

Bodobolero commented 1 year ago

2023/10/24 - 11:37 It is now running without getting stuck for > 24 hours - will continue the test for 2 more days before starting bi-secting.

@iabdalkader bad news: 2023/10/24 - 14:01 The RP2040 Micropython device lost connection to the ARduino IoT Cloud again (no more updates in the dashboard since 14:01) - while in the same room, connected to the same Wifi another RP2040 using C++ Arduino IoT cloud continued transmitting values (and this one is running since months)

I will now try to repro the problem just using the RP2040 Connect without DHT20 external sensor. Instead I will transmit accelerometer values to the cloud. If this fails, too I have a repro scenario that you can use to repro the problem.

Here is the "fake script" which transmits the first two accelerometer values from the on-board accelerometer as fake temperature and humidity values to get rid of the dht20.py library and DHT20 sensor as one possible source of failure - note that this version uses https://github.com/micropython/micropython-lib/blob/master/micropython/drivers/imu/lsm6dsox/lsm6dsox.py for reading the accelerometer values

from machine import I2C, Pin, WDT
from lsm6dsox import LSM6DSOX
import time
import network
import logging
from arduino_iot_cloud import ArduinoCloudClient, Task

from secrets import WIFI_SSID
from secrets import WIFI_PASSWORD
from secrets import DEVICE_ID
from secrets import CLOUD_PASSWORD

led = Pin("LED", Pin.OUT)  # Configure the desired LED pin as an output.

# read accelerometer and gyroscope
lsm = LSM6DSOX(I2C(0, scl=Pin(13), sda=Pin(12)))

# keep the watchdog alive every 4 seconds
def watchdog_keepalive(client):
   global wdt
   wdt.feed()
   logging.debug(".")

def on_switch_changed(client, value):
    # Toggles the hardware LED on or off.
    led.value(value)

    # Sets the value of the cloud variable "led" to the current state of the LED
    # and thus mirrors the hardware state in the cloud.
    client["led"] = value

def read_accel_as_fake_temp(client):
    global lsm
    values = lsm.accel()
    temperature = values[0]
    humidity = values[1]

    return temperature

def read_accel_as_fake_humidity(client):
    global lsm
    values = lsm.accel()
    temperature = values[0]
    humidity = values[1]

    return humidity

def wifi_connect():
    if not WIFI_SSID or not WIFI_PASSWORD:
        raise (
            Exception("Network is not configured. Set SSID and passwords in secrets.py"))
    wlan = network.WLAN(network.STA_IF)
    wlan.active(True)
    wlan.connect(WIFI_SSID, WIFI_PASSWORD)
    while not wlan.isconnected():
        logging.info("Trying to connect. Note this may take a while...")
        time.sleep_ms(500)
    logging.info(f"WiFi Connected {wlan.ifconfig()}")

# Configure the logger.
# All message equal or higher to the logger level are printed.
# To see more debugging messages, set level=logging.DEBUG.
logging.basicConfig(
    datefmt="%H:%M:%S",
    format="%(asctime)s.%(msecs)03d %(message)s",
    level=logging.INFO,
)

time.sleep_ms(1000)

logging.info(f"Connect to WiFi")

# NOTE: Add networking code here or in boot.py
wifi_connect()

# Create a client object to connect to the Arduino IoT cloud.
# For MicroPython, the key and cert files must be stored in DER format on the filesystem.
# Alternatively, a username and password can be used to authenticate:
client = ArduinoCloudClient(
    device_id=DEVICE_ID, username=DEVICE_ID, password=CLOUD_PASSWORD)

# Register cloud objects.
# Note: The following objects must be created first in the dashboard and linked to the device.
# This cloud object is initialized with its last known value from the cloud. When this object is updated
# from the dashboard, the on_switch_changed function is called with the client object and the new value.
client.register("ledSwitch", value=None,
                on_write=on_switch_changed, interval=0.250)

# This cloud object is updated manually in the switch's on_write_change callback to update the LED state in the cloud.
client.register("led", value=None)

# read only variables temperature and humidity
client.register("temperature", value=None, on_read=read_accel_as_fake_temp, interval=1.0)
client.register("humidity", value=None, on_read=read_accel_as_fake_humidity, interval=1.0)

# enable the WDT watchdog with a timeout of 7s (1s is the minimum)
wdt = WDT(timeout=7000)
# Register watchdog.feed() as a user task that is called every 4 seconds.
client.register(Task("user_task", on_run=watchdog_keepalive, interval=4.0))

logging.info(f"starting IoT client loop")

# Start the Arduino IoT cloud client.
client.start()

wdt.feed()

One question: I have an Arduino IoT Cloud Entry Plan. What error message or how would I get notified if I exceeded the quotas associated with the plan ?

2023/10/24 16:21 The board lost connection after 20 minutes this time.

iabdalkader commented 1 year ago

Okay good luck. In the meantime, I've sent the patch to MicroPython upstream.

iabdalkader commented 1 year ago

@Bodobolero

while in the same room, connected to the same Wifi another RP2040 using C++ Arduino IoT cloud continued

Note the C++ library enables the WDT by default, if it expires it will just reset and run the same sketch again, so I think if you run the same test again you should disable to see if it's resetting or not:

ArduinoCloud.begin(ArduinoIoTPreferredConnection);

Should be changed to

ArduinoCloud.begin(ArduinoIoTPreferredConnection, false);

Regarding the plans and limits, I'm not very sure you'd need to contact someone else, but they seem to be explained here: https://cloud.arduino.cc/plans

iabdalkader commented 1 year ago

I'm running your script and I do see resets, note that the frequency the WDT task is running at is cutting it really close, if you switch from debug to info you see it's almost expiring, I think you just need to lower the WDT task interval that's all.

07:50:41.000 .
07:50:45.000 .
07:50:49.000 .
07:50:53.000 .

The task interval is a best effort timeout, it's not hard realtime, if pushing records takes a little bit longer (just 1 more sec) WDT expires.

Bodobolero commented 1 year ago

I'm running your script and I do see resets, note that the frequency the WDT task is running at is cutting it really close, if you switch from debug to info you see it's almost expiring, I think you just need to lower the WDT task interval that's all.

07:50:41.000 .
07:50:45.000 .
07:50:49.000 .
07:50:53.000 .

The task interval is a best effort timeout, it's not hard realtime, if pushing records takes a little bit longer (just 1 more sec) WDT expires.

I realised this too yesterday and my latest experiments (still failing) have been run with

    # enable the WDT watchdog with a timeout of 5s (1s is the minimum)
    wdt = WDT(timeout=7000)
    # Register watchdog.feed() as a user task that is called every 4 seconds.
    client.register(Task("user_task", on_run=watchdog_keepalive, interval=4.0))

Did you try to repro the problem with my latest script AND increasing the WDT timeout?

Bodobolero commented 1 year ago

Note the C++ library enables the WDT by default, if it expires it will just reset and run the same sketch again, so I think if you run the same test again you should disable to see if it's resetting or not:

I don't understand that reasoning. In C++ I indeed explicitly activate the watchdog to AVOID problems

ArduinoCloud.begin(ArduinoIoTPreferredConnection, true);

and I also set the watchdog in the Micropython implementation based on your advice.

My goal is not to repro the failure in the C++ version and break it, too, but to repair the Micropython solution and get it working, too.

I don' mind if there is soft reboot by watchdog every few hours if my project becomes robust by doing that. I have several other microcontroller low power Lorawan implementations (using battery) where I actually go into deep sleep for extended periods and restart the script anyhow every few minutes. So AFAIK soft reset is not an issue - if it is working reliably.

Bodobolero commented 1 year ago

Regarding the plans and limits, I'm not very sure you'd need to contact someone else, but they seem to be explained here: https://cloud.arduino.cc/plans

Yes I found that page, too, but did not find any information if there is any limit on number of MQTT messages sent by my devices to the cloud. I just brainstormed for possible root causes of instability and asked myself if there are also errors caused by some kind of throttling or plan limits. Since there is no information on the plans web site and I didn't find anything related in the forums, let's assume for now this is not the reason for my failures.

iabdalkader commented 1 year ago

I don't understand that reasoning. In C++ I indeed explicitly activate the watchdog to AVOID problems

Using the WDT just masks the issue, if the WDT was enabled in C++ we can't say with confidence that it's been running solid for days, it could have simply been resetting without you noticing it (unless you can detect that somehow?). Note you can also just do the same in MicroPython, save the script as main.py and enable the WDT. This will just reset your board if it ever gets stuck, but that won't help us narrow down the underlying issue and fix it.

Bodobolero commented 1 year ago

I don't understand that reasoning. In C++ I indeed explicitly activate the watchdog to AVOID problems

Using the WDT just masks the issue, if the WDT was enabled in C++ we can't say with confidence that it's been running solid for days, it could have simply been resetting without you noticing it (unless you can detect that somehow?). Note you can also just do the same in MicroPython, save the script as main.py and enable the WDT. This will just reset your board if it ever gets stuck, but that won't help us narrow down the underlying issue and fix it.

I already had tried running the main.py with WDT enabled. However the WDT didn't seem to detect the problematic situation and did NOT reset the board. I just did not see any updates in the cloud anymore.

I think the watchdog in C++ is there and enabled by default for a reason - it seems it was introduced since other users had instability, too, not just in my Wifi ;-) Wifi can drop and other things can happen, so having the watchdog is a good thing and we should try to obtain a working solution that includes the watchdog.

I will be convinced if you can give me an example of a Micropython script that you yourself have been running for multiple weeks or months on an Arduino RP2040 Connect reliably connected to IoT cloud without an outage and without the watchdog.

In addition I just checked my Heating system project. It uses ESP32C3 connected to Arduino IoT cloud using C++. It is up and running in the same wifi (same SSID) since more than 15 days (without reset by watchdog which I know for sure because there I count the uptime and transmit it to the cloud).

iabdalkader commented 1 year ago

we should try to obtain a working solution that includes the watchdog.

I can't give you a working solution unless I know what the issue is first and then fix it. 'm grateful for your patience with testing this, but if we continue in the same direction we may not get anywhere. The reason I need to know if the C++ client is having the same issue, is because the only thing in common between the two clients is the Nina-FW. If it does, it points me in the right direction, if not then the issue is with the MicroPython Nina driver/Python client. I already found one issue in the driver and fixed it, I hope you have been using the firmware attached here, it's critical to use it.

However the WDT didn't seem to detect the problematic situation and did NOT reset the board.

If the connection code keeps failing to reconnect for some reason, it still allows other tasks to run, meaning the WDT will be refreshed in time, so it's possible for the client to go into that state above, enabling log warning level can easily show you that.

I will be convinced if you can give me an example of a Micropython script that you yourself have been running for multiple weeks or months on an Arduino RP2040 Connect reliably connected to IoT cloud without an outage and without the watchdog.

I'm not trying to convince you of anything specific, I'm just trying to find the root cause of the issue and fix it. I can keep the board connected to a power only source for a few days, maybe over the weekend, and let you know (I've been running your script above for about 2 hours with no resets, with a minor change to WDT interval to 1 second) before the WDT interval it reset after 20 minutes. However would that be helpful to you if your board still resets or gets stuck ?

In addition I just checked my Heating system project. It uses ESP32C3

ESP32 uses a different WiFi driver/firmware, this comparison is not very helpful, but if you run the Python client on ESP32 and it also resets/disconnects that would be interesting to know.

Bodobolero commented 1 year ago

@iabdalkader 1) I have now modified my RP2040 Connect C++ project to use

 ArduinoCloud.begin(ArduinoIoTPreferredConnection, false);

I will let you know if it stops transmitting data to the IoT cloud after some hours, too.

2) I have started my micropython version with logging.DEBUG level and have adjusted the WDT task to 1 second, I have also added try: except: to log any exceptions to each callback - though I doubt my own code throws any exceptions but the exceptions most likely a are in your Arduino IoT co-routines calling my callbacks.

I will update this comment once the cloud connections are lost in either experiment.

iabdalkader commented 1 year ago

Thank you! I saved your script as main.py, and enabled verbose debugging (both at the script and the driver levels), connected it to a power source, and I have UART REPL enabled, this way if it gets into a state where it's not updating the cloud but still not resetting with WDT, I can connect an FTDI USB->Serial and see what's going on.

iabdalkader commented 1 year ago

@Bodobolero quick update, it's been running for 24 hours so far and still going, I will keep it running for at least one more day. Note please don't post updates by editing your comments, I don't get a notification when you do that and I may miss them.

Bodobolero commented 1 year ago

@iabdalkader

P.S. I had tried to log to file using

logging.basicConfig(
        datefmt="%H:%M:%S",
        format="%(asctime)s.%(msecs)03d %(message)s",
        level=logging.DEBUG,
        filename="log.txt"
    )

however this corrupted my complete flash filesystem (it became unusable) and I had to reformat the flash filesystem - that is why I have been running from my Mac to see the logs in the serial monitor. It would help me to get a working example how to log to file to narrow down the problem.

iabdalkader commented 1 year ago

however this corrupted my complete flash filesystem (it became unusable) and I had to reformat the flash filesystem - that is why I have been running from my Mac to see the logs in the serial monitor. It would help me to get a working example how to log to file to narrow down the problem.

Yes I saw that issue, it's because the file doesn't get flushed and closed properly, I would avoid using the file log feature for now.

Bodobolero commented 1 year ago

2023-10-25: 10:00 C++ version started 2023-10-26: 11:00 Micropython version started 2023-10-27: This morning, both the C++ and the Micropython scripts are still running. 2023-10-27, 19:03 - both are still running 2023-10-28, 8:11 - both are still running

iabdalkader commented 1 year ago

Hi, mine has been running for 48 hours now, still working. Note I'm running your script above, with the firmware attached here in this issue that includes the driver fix. Do you want me to keep it connect for one more day or so ?

Bodobolero commented 1 year ago

Hi, mine has been running for 48 hours now, still working. Note I'm running your script above, with the firmware attached here in this issue that includes the driver fix. Do you want me to keep it connect for one more day or so ?

yes, please

Bodobolero commented 1 year ago

@iabdalkader 2023-10-28, 9:43: The Micropython version stopped working after approximately 2 days, the last data transmitted was at 9:43 local time (7:43:10 UTC in the log) 2023-10-28, 17:17: The C++ version (in the same room, connected to same SSID) continues to transmit data to the cloud 2023-10-28, 22:07:12: The C++ version stops to transmit data to the cloud, too, (after about 3.5 days), however the ESP32S3 (Heating system) in the room next to it (same SSID) continues to transmit data. I will now reactivate the watchdog in the C++ version because it resolves the problem in the c++ version and because I need that system to keep running.

I am looking at the 24 hour history of the Micropython version in the dashboard. I notice that between 1:25 AM and 7:58 AM on Oct 28 the Micropython version did not transmit data. It restarted sending data in the time interval from 7:58 AM- 9:43 AM - and at this point (9:43AM) it stopped completely (as I said all while the C++ version was continuously sending data). Note that my dashboard shows my local time (9:43) while the log below seems to show UTC time (7:43)

I attach the last rows of the debug log before the transmitting stopped - it seems it got stuck in the 07:45:00.000 Connecting to Arduino IoT cloud... - it was still pushing data to the cloud at 7:43:10 and then it tries several reconnects until it gets stuck. I attach the relevant parts of the log below

07:43:02.000 Update: humidity value: 0.3880615 ts: 1698478982
07:43:02.000 Update: temperature value: 0.1700439 ts: 1698478982
07:43:02.000 Pushing records to Arduino IoT cloud:
07:43:02.000   ==> record: temperature value: 0.1700439...
07:43:02.000   ==> record: humidity value: 0.3880615...                                                                   
07:43:04.000 WDT.feed()                                                                                                   
07:43:04.000 Update: humidity value: 0.3884277 ts: 1698478984                                                             
07:43:04.000 Update: temperature value: 0.1708984 ts: 1698478984                                                          
07:43:04.000 Pushing records to Arduino IoT cloud:                                                                        
07:43:04.000   ==> record: temperature value: 0.1708984...
07:43:04.000   ==> record: humidity value: 0.3884277...
07:43:08.000 WDT.feed()
07:43:08.000 Update: humidity value: 0.3890381 ts: 1698478988
07:43:08.000 Update: temperature value: 0.1712646 ts: 1698478988
07:43:08.000 Pushing records to Arduino IoT cloud:
07:43:08.000   ==> record: temperature value: 0.1712646...
07:43:08.000   ==> record: humidity value: 0.3890381...
07:43:10.000 WDT.feed()
07:43:10.000 Update: humidity value: 0.3890381 ts: 1698478990
07:43:10.000 Update: temperature value: 0.1708984 ts: 1698478990
07:43:10.000 Pushing records to Arduino IoT cloud:
07:43:10.000   ==> record: temperature value: 0.1708984...
07:43:10.000   ==> record: humidity value: 0.3890381...
00:00:03.000 Connect to WiFi
00:00:07.000 WiFi Connected ('192.168.179.9', '255.255.255.0', '192.168.179.1', '192.168.179.1')
07:43:22.000 RTC time set from NTP.
07:43:22.000 Init: r:m value: getLastValues ts: 1698479002
07:43:22.000 Init: temperature value: 0.170166 ts: 1698479002
07:43:22.000 Init: humidity value: 0.3891602 ts: 1698479002
07:43:22.000 starting IoT client loop
07:43:22.000 task: humidity created.
07:43:22.000 task: temperature created.
07:43:22.000 task: ledSwitch created.
07:43:22.000 task: user_task created.
07:43:22.000 task: conn_task created.
07:43:22.000 Update: humidity value: 0.3891602 ts: 1698479002
07:43:22.000 Update: temperature value: 0.1702881 ts: 1698479002
07:43:22.000 WDT.feed()
07:43:22.000 Connecting to Arduino IoT cloud...
00:00:03.000 Connect to WiFi
00:00:06.000 WiFi Connected ('192.168.179.9', '255.255.255.0', '192.168.179.1', '192.168.179.1')
07:43:33.000 RTC time set from NTP.                                                                                       
07:43:33.000 Init: r:m value: getLastValues ts: 1698479013                                                                
07:43:33.000 Init: temperature value: 0.1710205 ts: 1698479013                                                            
07:43:33.000 Init: humidity value: 0.388916 ts: 1698479013                                                                
07:43:33.000 starting IoT client loop                                                                                     
07:43:33.000 task: humidity created.
07:43:33.000 task: temperature created.
07:43:33.000 task: ledSwitch created.
07:43:33.000 task: user_task created.
07:43:33.000 task: conn_task created.
07:43:33.000 Update: humidity value: 0.388916 ts: 1698479013
07:43:33.000 Update: temperature value: 0.1707764 ts: 1698479013
07:43:33.000 WDT.feed()
07:43:33.000 Connecting to Arduino IoT cloud...
00:00:03.000 Connect to WiFi
00:00:06.000 WiFi Connected ('192.168.179.9', '255.255.255.0', '192.168.179.1', '192.168.179.1')
07:43:44.000 RTC time set from NTP.                                                                                       
07:43:44.000 Init: r:m value: getLastValues ts: 1698479024                                                                
07:43:44.000 Init: temperature value: 0.170166 ts: 1698479024                                                             
07:43:44.000 Init: humidity value: 0.3895264 ts: 1698479024                                                               
07:43:44.000 starting IoT client loop                                                                                     
07:43:44.000 task: humidity created.
07:43:44.000 task: temperature created.
07:43:44.000 task: ledSwitch created.
07:43:44.000 task: user_task created.
07:43:44.000 task: conn_task created.
07:43:44.000 Update: humidity value: 0.3895264 ts: 1698479024
07:43:44.000 Update: temperature value: 0.1706543 ts: 1698479024
07:43:44.000 WDT.feed()
07:43:44.000 Connecting to Arduino IoT cloud...
07:43:48.000 task: discovery created.
07:43:48.000 task: mqtt_task created.
07:43:48.000 Update: humidity value: 0.3884277 ts: 1698479028                                                             
07:43:48.000 Update: temperature value: 0.1706543 ts: 1698479028                                                          
07:43:48.000 WDT.feed()                                                                                                   
07:43:48.000 Subscribe: b'/a/d/fb1be083-6c7b-4a39-9ca2-ad286fd8a2e4/e/i'.                                                 
07:43:52.000 task: conn_task complete.                                                                                    
07:43:52.000 Update: humidity value: 0.3886719 ts: 1698479032
07:43:52.000 Update: temperature value: 0.1708984 ts: 1698479032
07:43:52.000 WDT.feed()
07:43:53.000 Update: humidity value: 0.3896484 ts: 1698479033
07:43:53.000 Update: temperature value: 0.1699219 ts: 1698479033
07:43:53.000 WDT.feed()
07:43:54.000 Update: humidity value: 0.3892822 ts: 1698479034
07:43:54.000 Update: temperature value: 0.1708984 ts: 1698479034
07:43:54.000 WDT.feed()
07:43:55.000 mqtt topic: b'a2e4/e/i'... message: b'\x81\xa3"\xfbA\xd9O.'...
07:43:55.000 Init: thing_id value: 18fc6f6b-6c1c-42be-81d8-8a88024daa15 ts: 1698479035
07:43:55.000 Update: humidity value: 0.3881836 ts: 1698479035                                                             
07:43:56.000 Subscribe: b'/a/t/18fc6f6b-6c1c-42be-81d8-8a88024daa15/e/i'.                                                 
07:43:58.000 Subscribe: b'/a/t/18fc6f6b-6c1c-42be-81d8-8a88024daa15/shadow/i'.                                            
00:00:03.000 Connect to WiFi                                                                                              
00:00:06.000 WiFi Connected ('192.168.179.9', '255.255.255.0', '192.168.179.1', '192.168.179.1')                          
07:44:06.000 RTC time set from NTP.
07:44:06.000 Init: r:m value: getLastValues ts: 1698479046
07:44:06.000 Init: temperature value: 0.170166 ts: 1698479046
07:44:06.000 Init: humidity value: 0.3892822 ts: 1698479046
07:44:06.000 starting IoT client loop
07:44:06.000 task: humidity created.
07:44:06.000 task: temperature created.
07:44:06.000 task: ledSwitch created.
07:44:06.000 task: user_task created.
07:44:06.000 task: conn_task created.
07:44:06.000 Update: humidity value: 0.3892822 ts: 1698479046
07:44:06.000 Update: temperature value: 0.1702881 ts: 1698479046                                                          
07:44:06.000 WDT.feed()                                                                                                   
07:44:06.000 Connecting to Arduino IoT cloud...                                                                           
00:00:03.000 Connect to WiFi                                                                                              
00:00:06.000 WiFi Connected ('192.168.179.9', '255.255.255.0', '192.168.179.1', '192.168.179.1')                          
07:44:17.000 RTC time set from NTP.
07:44:17.000 Init: r:m value: getLastValues ts: 1698479057
07:44:17.000 Init: temperature value: 0.1711426 ts: 1698479057
07:44:17.000 Init: humidity value: 0.3891602 ts: 1698479057
07:44:17.000 starting IoT client loop
07:44:17.000 task: humidity created.
07:44:17.000 task: temperature created.
07:44:17.000 task: ledSwitch created.
07:44:17.000 task: user_task created.
07:44:17.000 task: conn_task created.
07:44:17.000 Update: humidity value: 0.3891602 ts: 1698479057
07:44:17.000 Update: temperature value: 0.1704102 ts: 1698479057                                                          
07:44:17.000 WDT.feed()                                                                                                   
07:44:17.000 Connecting to Arduino IoT cloud...                                                                           
00:00:03.000 Connect to WiFi                                                                                              
00:00:06.000 WiFi Connected ('192.168.179.9', '255.255.255.0', '192.168.179.1', '192.168.179.1')                          
07:44:28.000 RTC time set from NTP.
07:44:28.000 Init: r:m value: getLastValues ts: 1698479068
07:44:28.000 Init: temperature value: 0.1710205 ts: 1698479068
07:44:28.000 Init: humidity value: 0.3895264 ts: 1698479068
07:44:28.000 starting IoT client loop
07:44:28.000 task: humidity created.
07:44:28.000 task: temperature created.
07:44:28.000 task: ledSwitch created.
07:44:28.000 task: user_task created.
07:44:28.000 task: conn_task created.
07:44:28.000 Update: humidity value: 0.3895264 ts: 1698479068
07:44:28.000 Update: temperature value: 0.170166 ts: 1698479068                +-----------------------------------------+
07:44:28.000 WDT.feed()                                                        |                                         |
07:44:28.000 Connecting to Arduino IoT cloud...                                |  Cannot open /dev/tty.usbmodem141101!   |
00:00:03.000 Connect to WiFi                                                   |                                         |
00:00:06.000 WiFi Connected ('192.168.179.9', '255.255.255.0', '192.168.179.1',+-----------------------------------------+
07:44:38.000 RTC time set from NTP.
07:44:38.000 Init: r:m value: getLastValues ts: 1698479078
07:44:38.000 Init: temperature value: 0.1705322 ts: 1698479078
07:44:38.000 Init: humidity value: 0.388916 ts: 1698479078
07:44:38.000 starting IoT client loop
07:44:38.000 task: humidity created.
07:44:38.000 task: temperature created.
07:44:38.000 task: ledSwitch created.
07:44:38.000 task: user_task created.
07:44:38.000 task: conn_task created.
07:44:38.000 Update: humidity value: 0.388916 ts: 1698479078
07:44:38.000 Update: temperature value: 0.1707764 ts: 1698479078
07:44:38.000 WDT.feed()
07:44:38.000 Connecting to Arduino IoT cloud...
00:00:03.000 Connect to WiFi
00:00:06.000 WiFi Connected ('192.168.179.9', '255.255.255.0', '192.168.179.1', '192.168.179.1')
07:44:49.000 RTC time set from NTP.
07:44:49.000 Init: r:m value: getLastValues ts: 1698479089
07:44:49.000 Init: temperature value: 0.1704102 ts: 1698479089
07:44:49.000 Init: humidity value: 0.388916 ts: 1698479089
07:44:49.000 starting IoT client loop
07:44:49.000 task: humidity created.
07:44:49.000 task: temperature created.
07:44:49.000 task: ledSwitch created.
07:44:49.000 task: user_task created.
07:44:49.000 task: conn_task created.
07:44:49.000 Update: humidity value: 0.388916 ts: 1698479089
07:44:49.000 Update: temperature value: 0.1699219 ts: 1698479089
07:44:49.000 WDT.feed()
07:44:49.000 Connecting to Arduino IoT cloud...
00:00:03.000 Connect to WiFi
00:00:06.000 WiFi Connected ('192.168.179.9', '255.255.255.0', '192.168.179.1', '192.168.179.1')
07:45:00.000 RTC time set from NTP.
07:45:00.000 Init: r:m value: getLastValues ts: 1698479100
07:45:00.000 Init: temperature value: 0.1706543 ts: 1698479100
07:45:00.000 Init: humidity value: 0.3887939 ts: 1698479100
07:45:00.000 starting IoT client loop
07:45:00.000 task: humidity created.
07:45:00.000 task: temperature created.
07:45:00.000 task: ledSwitch created.
07:45:00.000 task: user_task created.
07:45:00.000 task: conn_task created.
07:45:00.000 Update: humidity value: 0.3887939 ts: 1698479100
07:45:00.000 Update: temperature value: 0.1696777 ts: 1698479100
07:45:00.000 WDT.feed()
07:45:00.000 Connecting to Arduino IoT cloud...

Meta-Z for help | 115200 8N1 | NOR | Minicom 2.8 | VT102 | Offline | tty.usbmodem141101                                                                                                                
iabdalkader commented 1 year ago

2023-10-28, 9:43: The Micropython version stopped working after approximately 2 days, 2023-10-28, 22:07:12: The C++ version stops to transmit data to the cloud, too, (after about 3.5 days),

Hi my board has been running for 4 days straight now with no issues, but after looking at your log it seemed like it might be a WiFi connection issue, so I restarted my router a few times and eventually the Nano got stuck in the WiFi connection code. Note that WiFi connection code runs first thing before even enabling the WDT, so if it gets stuck for any reason, the WDT couldn't reset it. I can look into this reconnect issue tomorrow, but one quick fix for you is to enable the WDT right before the loop in wifi_connect, this way even if it gets stuck the board will be restarted.

Further info, what I saw here was ESP not being able to connect to WiFi with "reason auth expired".

Bodobolero commented 1 year ago

Note that WiFi connection code runs first thing before even enabling the WDT, so if it gets stuck for any reason, the WDT couldn't reset it. I can look into this reconnect issue tomorrow, but one quick fix for you is to enable the WDT right before the loop in wifi_connect, this way even if it gets stuck the board will be restarted.

@iabdalkader Now that you said it this is so obvious I can't explain how I overlooked it! Great find, will try it immediately. This explains why watchdog wasn't helping.

I did some other changes too 1) checking wlan.isconnected() before feeding the watchdog 2) avoiding that the temperate and humidity call-back both access the DHT sensor (according to specification there should be a pause between successive sensor reads) 3) reducing the interval to 10 seconds (to match the C++ implementation) 4) back to using the real DHT sensor instead of accelerator 5) other small changes. This is the new version I am now testing:

from machine import I2C, Pin, WDT
from dht20 import DHT20
import time
import network
import logging
from arduino_iot_cloud import ArduinoCloudClient, Task

from secrets import WIFI_SSID
from secrets import WIFI_PASSWORD
from secrets import DEVICE_ID
from secrets import CLOUD_PASSWORD

led = Pin("LED", Pin.OUT)  # Configure the desired LED pin as an output.

# connect DHT20 sensor to I2C grove connector
i2c = I2C(0, scl=Pin(13), sda=Pin(12))
dht20 = DHT20(i2c)

# use globals to only read DHT in one callback
temperature = 0.0
humidity = 0.0
wlan = None

# keep the watchdog alive every second
def watchdog_keepalive(client):
    global wdt
    if (wlan.isconnected()):
        wdt.feed()
        logging.debug(".")
    else:
        logging.debug("WLAN DISCONNECTED! - watchdog not fed!")

def on_switch_changed(client, value):
    # Toggles the hardware LED on or off.
    led.value(value)

    # Sets the value of the cloud variable "led" to the current state of the LED
    # and thus mirrors the hardware state in the cloud.
    client["led"] = value

def read_temperature(client):
    global dht20
    global temperature
    global humidity
    temperature = dht20.dht20_temperature()
    humidity = dht20.dht20_humidity()
    if logging.getLogger().isEnabledFor(logging.DEBUG):
        logging.debug(f"DHT20: {temperature} C")
    return temperature

def read_humidity(client):
    global humidity
    if logging.getLogger().isEnabledFor(logging.DEBUG):
        logging.debug(f"DHT20: {humidity} %")
    return humidity

def wifi_connect():
    global wlan
    if not WIFI_SSID or not WIFI_PASSWORD:
        raise (
            Exception("Network is not configured. Set SSID and passwords in secrets.py"))
    wlan = network.WLAN(network.STA_IF)
    wlan.active(True)
    wlan.connect(WIFI_SSID, WIFI_PASSWORD)
    while not wlan.isconnected():
        logging.info("Trying to connect. Note this may take a while...")
        time.sleep_ms(500)
    logging.info(f"WiFi Connected {wlan.ifconfig()}")

# Configure the logger.
# All message equal or higher to the logger level are printed.
# To see more debugging messages, set level=logging.DEBUG.
logging.basicConfig(
    datefmt="%H:%M:%S",
    format="%(asctime)s.%(msecs)03d %(message)s",
    level=logging.INFO,
)

time.sleep_ms(1000)

# enable the WDT watchdog with a timeout of 7s (1s is the minimum)
# if wifi can not be connected or is lost the board will be restarted
wdt = WDT(timeout=7000)

logging.info(f"Connect to WiFi")

# NOTE: Add networking code here or in boot.py
wifi_connect()

# Create a client object to connect to the Arduino IoT cloud.
# For MicroPython, the key and cert files must be stored in DER format on the filesystem.
# Alternatively, a username and password can be used to authenticate:
client = ArduinoCloudClient(
    device_id=DEVICE_ID, username=DEVICE_ID, password=CLOUD_PASSWORD)

# Register cloud objects.
# Note: The following objects must be created first in the dashboard and linked to the device.
# This cloud object is initialized with its last known value from the cloud. When this object is updated
# from the dashboard, the on_switch_changed function is called with the client object and the new value.
client.register("ledSwitch", value=None,
                on_write=on_switch_changed, interval=0.250)

# This cloud object is updated manually in the switch's on_write_change callback to update the LED state in the cloud.
client.register("led", value=None)

# read only variables temperature and humidity
client.register("temperature", value=None, on_read=read_temperature, interval=10.0)
client.register("humidity", value=None, on_read=read_humidity, interval=10.0)

# Register watchdog.feed() as a user task that is called every 1 seconds.
client.register(Task("user_task", on_run=watchdog_keepalive, interval=1.0))

logging.info(f"starting IoT client loop")

# Start the Arduino IoT cloud client.
client.start()

2023-10-29, 10:00 new script (Watchdog before connect) started

Bodobolero commented 1 year ago

2. avoiding that the temperate and humidity call-back both access the DHT sensor (according to specification there should be a pause between successive sensor reads)

@iabdalkader This question is not related to this issue, but since you are the author of this library maybe you can help here, too. Here I think I have a general issue to understand how the Micropython API for IoT Cloud is supposed to be used with multiple variables read from sensors. As far as I understood I can only return one variable's value in a callback. I also can not control the timing of the callbacks (only the interval, but not exactly the time distance between two callbacks' invocations). So let's say a sensor requires a time interval between consecutive reads of 1 second but I have one callback for temperature and one for humidity - how do I avoid that those callbacks are invoked immediately one after the other - or is there another pattern to read two values in the same callback other than using global variables as I do in the script above?

iabdalkader commented 1 year ago

Now that you said it this is so obvious I can't explain how I overlooked it! Great find, will try it immediately. This explains why watchdog wasn't helping.

What you have there seems fine. Another option you have is to run the WiFi connect code in a task, since it seems to block on connect(), the WDT will expire and reset the board if connect keeps failing. I will still look into the reason why connect blocks, it shouldn't btw.

So let's say a sensor requires a time interval between consecutive reads of 1 second but I have one callback for temperature and one for humidity - how do I avoid that those callbacks are invoked immediately one after the other -

You can just create a class that wraps this sensor driver code, and keeps track of the last time the sensor was read using time.ticks_ms() if it's being read too soon just return the last value.

Bodobolero commented 1 year ago

@iabdalkader

2023-10-30, 21:24 CET: The corrected Micropython version above (starting watchdog before wifi connect) stopped transmitting data to the cloud. I have no log because it was running unattended (no serial monitor). So the watchdog does either not seem to trigger a restart or the soft restart does not reset and recover the Wifi-Nina. The C++ version (with watchdog re-activated) continues to run and is still running

iabdalkader commented 1 year ago

So the watchdog does either not seem to trigger a restart or the soft restart does not reset and recover the Wifi-Nina.

The watchdog is reliable once enabled, it's impossible to disable it, and when it expires it doesn't soft-reset the board it's a full hardware reset, also the Nina module gets hardware-reset via pins on init. The only thing I can think of is that it's stuck in a WDT reset/connect loop, I've seen this once before after restarting my router a few times, connect just kept failing.

I'm attaching a firmware that might help with debugging the next time this happens, this firmware has driver debugging enabled and REPL over UART. If the board gets into a state where it's stuck or something, just keep it powered and connect any USB to serial bridge to TX, RX and GND pins, and you'll be able to see exactly what's going on, no need to keep a PC running all that time.

firmware.uf2.zip

Bodobolero commented 1 year ago

connect any USB to serial bridge to TX, RX and GND pins

I ordered a USB to serial bridge which I expect to be delivered on Nov, 2nd. Then I can start debugging.

iabdalkader commented 1 year ago

Sounds good, thanks! In the meantime, I'm looking into an unrelated issue but fixing it will let us easily move wifi connect code to a task.

Bodobolero commented 1 year ago

@iabdalkader I tried to run my script with the debug firmware from my PC and got

MicroPython v1.22.0-preview.69.gc146017f8.dirty on 2023-10-31; Arduino Nano RP2040 Connect with RP2040
Type "help()" for more information.
>>> 
MPY: soft reboot
00:00:59.000 Connect to WiFi
!!Exception!!
             [Errno 6] Device not configured

>>> 

is this as expected ? If I do NOT connect the board to the PC (just to power) the script starts and connects to the IoT cloud. So it seems the firmware is working as you expected. Just wanted to confirm that this behaviour is normal and will allow me to connect later over UART.