adafruit / Adafruit_CircuitPython_AzureIoT

Access to Microsoft Azure IoT device, messaging, and job services from CircuitPython!
MIT License
19 stars 15 forks source link

RuntimeError: Repeated socket failures on ESP32-S2 #44

Open calcut opened 2 years ago

calcut commented 2 years ago

I realise Adafruit folks are working on this library at the moment, so perhaps this will work itself out as part of that.

However, I've been running into issues when trying to use IoTCentralDevice.

azureiot_central_simpletest.py does work for me However if I remove the if statement if time.localtime().tm_year < 2022: then the code runs once, but fails after a soft reset or reload. A hard reset typically gets it going again, but the behaviour doesn't seem to be 100% repeatable. I'd like to understand why, so I can use requests and MQTT reliably in the same code.

(I've run into similar weirdness when using AdafruitIO, which was my motivation to try Azure in the first place, but I think it could be a more fundamental issue with sockets in circuitpython / ESP32-S2)

I wonder if it is related to Anecdata's comment about ESP32-S2 only supporting 4 sockets https://github.com/adafruit/Adafruit_CircuitPython_Requests/issues/62#issuecomment-865173242 Perhaps after a soft reset the sockets aren't being cleared... somehow!

Digging into the libraries a little bit shows me the error occurs here: https://github.com/adafruit/Adafruit_CircuitPython_MiniMQTT/blob/c4517bbbde70e2dc7fdb1a1fcfe120d1b6b2f530/adafruit_minimqtt/adafruit_minimqtt.py#L279

adding some print()s reveals that it fails to get a socket because of:

Failed SSL handshake
Unhandled ESP TLS error 0 0 8004 -1
Unhandled ESP TLS error 0 0 8004 -1
Unhandled ESP TLS error 0 0 8004 -1
Unhandled ESP TLS error 0 0 8004 -1

Would be great to understand what is going on so we can avoid it or handle it better.

My test code, slightly modified from the simpletest :

# SPDX-FileCopyrightText: 2021 ladyada for Adafruit Industries
# SPDX-License-Identifier: MIT

import ssl
import time

import rtc
import socketpool
import wifi

import adafruit_requests
from adafruit_azureiot import IoTCentralDevice

import supervisor

from secrets import secrets

print("Connecting to WiFi...")
wifi.radio.connect(secrets["ssid"], secrets["password"])
print("Connected to WiFi!")

# if time.localtime().tm_year < 2022:
#     print("Setting System Time in UTC")
#     pool = socketpool.SocketPool(wifi.radio)
#     requests = adafruit_requests.Session(pool, ssl.create_default_context())
#     response = requests.get("https://io.adafruit.com/api/v2/time/seconds")
#     if response:
#         if response.status_code == 200:
#             r = rtc.RTC()
#             r.datetime = time.localtime(int(response.text))
#             print(f"System Time: {r.datetime}")
#         else:
#             print("Setting time failed")
#     else:
#         print("Year seems good, skipping set time.")

print("Setting System Time in UTC")
pool = socketpool.SocketPool(wifi.radio)
requests = adafruit_requests.Session(pool, ssl.create_default_context())
response = requests.get("https://io.adafruit.com/api/v2/time/seconds")
if response:
    if response.status_code == 200:
        r = rtc.RTC()
        r.datetime = time.localtime(int(response.text))
        print(f"System Time: {r.datetime}")

# Create an IoT Hub device client and connect
esp = None
# pool = socketpool.SocketPool(wifi.radio)
device = IoTCentralDevice(
    pool, esp, secrets["id_scope"], secrets["device_id"], secrets["device_sas_key"]
)

print("Connecting to Azure IoT Central...")
device.connect()

print("Connected to Azure IoT Central!")

supervisor.reload()
calcut commented 2 years ago

A more minimal example:

import ssl
import socketpool
import wifi
import adafruit_requests
from secrets import secrets

SHOW_ISSUE = True

print("Connecting to WiFi...")
wifi.radio.connect(secrets["ssid"], secrets["password"])
print("Connected to WiFi!")

pool = socketpool.SocketPool(wifi.radio)

if SHOW_ISSUE:
    # Doesn't *always* trigger the issue, maybe 2/5 times
    requests = adafruit_requests.Session(pool, ssl.create_default_context())
    response = requests.get("https://io.adafruit.com/api/v2/time/seconds")
    print(f'got time of {response.text} using requests')

print("Connecting to Azure IoT Central...")

# This block is supposed to mimic what the libraries do, but as a minimal example
host='global.azure-devices-provisioning.net'
port=8883
addr_info = pool.getaddrinfo(host, port, 0, pool.SOCK_STREAM)[0]
sock = pool.socket(addr_info[0], addr_info[1])
ssl_context=ssl.create_default_context()
sock = ssl_context.wrap_socket(sock, server_hostname=host)

sock.connect((host, port))
# often leads to "OSError: Failed SSL handshake"
# But only if requests has been used first

print("Connected to Azure IoT Central!")
calcut commented 2 years ago

Regarding resets: I've noticed the error is far more likely to occur after a soft reset. (I'm not certain if I've ever seen it on the first run after a hard reset.)

Repeating a soft reset multiple times can 'clear' the error (so it isn't true to say that a hard reset is always required after the Failed SSL handshake)

calcut commented 2 years ago

response.close() after getting the time doesn't prevent the error

calcut commented 2 years ago

Running requests._free_sockets() after using requests library seems to be a workaround.

I'm a relative newbie to sockets/SSL, but it looks like Requests keeps hold of sockets to re-use them. However when trying to get a new socket from the pool later outside of Requests (e.g. for MQTT), it can sometimes return a socket that is already in use by Requests, which leads to a failure.

Hoping someone more experienced than me can verify and suggest possible improvements.

gary-major commented 8 months ago

Same general issue -> Repeated socket errors on the requests library. I have two scenarios and the only difference is one scenario will deep sleep and one will just time.sleep(300) inside a while true loop.

In a way, i can see the value of using requests._free_sockets() for my second scenario although I agree that this seems like a workaround, however for my second scenario , when the board is coming out of deep sleep on the ESP32-S2 there shouldn't be a need as the board is essentially operating like the reset button has been pressed.

Using CircuitPython 8.2.6. This is easy to replicate for me on a few ESP32 board types under each scenario. The time it takes for the code to stop varies, but happens within a 24 hour period.