Closed DJDevon3 closed 5 months ago
I see espidf.MemoryError:
on ESP32-S2 during requests _get_socket
(8.x.x - haven't updated my "production" systems yet), even after a fresh reset. My code automatically catches the exceptions and retries a few times, and if those fail, either abandons and tries again later (often successfully) or resets (microcontroller.reset). It may go a few cycles like that, then be successful repeatedly either after the retries or after one or more resets. I have suspected something variable not quite ready in the ESP-IDF on startup, though sometimes it can happen well into a run. No idea if this is comparable to your situation.
This happens. I don't think it is a CP bug.
The ESP-IDF keeps all TLS related buffers in internal RAM and, as a result, the S2 will run out before the S3 does. The S3 has more internal RAM. Use it instead.
It's possible that's why I stopped using it on the S2 around 8.0 beta. It would work for a while then crash. At the time it was suspected that there was an ESP-IDF memory leak. Very likely why I haven't used this project since. At the time the ESP32-S2 was far more stable than the S3. The S3 is now so stable that it works like a dream for most of my projects.
I can switch to an S3 but that's not really the point. Yes Anecdata the behavior you're seeing is similar. It's random and the amount of error handling I'd have to write in to catch every possible situation would be mind boggling.
One second I have 2.1MB of memfree then immediately afterward on the next line 1 single call to requests.get throws a memory error. It doesn't seem like a lack of memory or at least not according to the behavior reported by memfree. It's not like I have 10KB free... I have around 2MB free when it crashes. The API call I'm making when it crashes most times should only take about 200 bytes...
Is there a way I can see how big the TLS buffer is? Track its size as the script goes? Why isn't that buffer reported in memfree if it's running out of buffer? Can we increase the buffer?
I can report similar behaviour switching from 8 to 9 on Feather ESP32 S2 I can successfully requests.get() an https address but fail with IO send_data() using IO_HTTP
Adafruit CircuitPython 9.0.2 on 2024-03-28; Adafruit Feather ESP32S2 with ESP32S2
Traceback (most recent call last): File "code.py", line 313, in <module> File "adafruit_io/adafruit_io.py", line 624, in send_data File "adafruit_io/adafruit_io.py", line 564, in _post File "adafruit_requests.py", line 595, in post File "adafruit_requests.py", line 525, in request File "adafruit_connection_manager.py", line 264, in get_socket MemoryError:
313. io.send_data('temperature', temperature, precision=1)
Please let me know if you need any other information. thanks
Here's a revised test script that requires no logins. Anyone with an S2 and adafruit_requests library should be able to run this script.
Adafruit CircuitPython 8.2.5 on 2023-09-07; Adafruit Feather ESP32S2 with ESP32S2
Board ID:adafruit_feather_esp32s2
import gc
import os
import time
import wifi
import ssl
import socketpool
import adafruit_requests
print("-" * 40)
start_mem = gc.mem_free()
print(f"Prior to Script Config: {start_mem}")
# Initialize WiFi Pool (There can be only one pool)
pool = socketpool.SocketPool(wifi.radio)
requests = adafruit_requests.Session(pool, ssl.create_default_context())
# All active flights JSON: https://opensky-network.org/api/states/all (PICK ONE!)
TRANSPONDER = "88044d"
# Ensure ALL of these are setup in settings.toml
ssid = os.getenv("CIRCUITPY_WIFI_SSID")
password = os.getenv("CIRCUITPY_WIFI_PASSWORD")
# API Polling Rate
# 900 = 15 mins, 1800 = 30 mins, 3600 = 1 hour
SLEEP_TIME = 900
def time_calc(input_time):
"""Converts seconds to minutes/hours/days"""
if input_time < 60:
return f"{input_time:.0f} seconds"
if input_time < 3600:
return f"{input_time / 60:.0f} minutes"
if input_time < 86400:
return f"{input_time / 60 / 60:.0f} hours"
return f"{input_time / 60 / 60 / 24:.1f} days"
# Requests URL - icao24 is their endpoint required for a transponder
# example https://opensky-network.org/api/states/all?icao24=a808c5
OPENSKY_SOURCE = "https://opensky-network.org/api/states/all?" + "icao24=" + TRANSPONDER
ROCKETLAUNCH_SOURCE = "https://fdo.rocketlaunch.live/json/launches/next/1"
JSON_GET_URL = "https://httpbin.org/get"
STATUS_TEST = "https://httpbin.org/status/"
# Originally attempted to use SVG. Found JSON exists with same filename.
# https://img.shields.io/discord/327254708534116352.svg
ADA_DISCORD_JSON = "https://img.shields.io/discord/327254708534116352.json"
def print_http_status(code, description):
"""Returns HTTP status code and description"""
if "100" <= code <= "103":
print(f" | ✅ Status Test: {code} - {description}")
elif "200" <= code <= "299":
print(f" | ✅ Status Test: {code} - {description}")
elif "300" <= code <= "600":
print(f" | ❌ Status Test: {code} - {description}")
else:
print(f" | Unknown Response Status: {code} - {description}")
# All HTTP Status Codes
http_status_codes = {
"100": "Continue",
"101": "Switching Protocols",
"102": "Processing",
"103": "Early Hints",
"200": "OK",
"201": "Created",
"202": "Accepted",
"203": "Non-Authoritative Information",
"204": "No Content",
"205": "Reset Content",
"206": "Partial Content",
"207": "Multi-Status",
"208": "Already Reported",
"226": "IM Used",
"300": "Multiple Choices",
"301": "Moved Permanently",
"302": "Found",
"303": "See Other",
"304": "Not Modified",
"305": "Use Proxy",
"306": "Unused",
"307": "Temporary Redirect",
"308": "Permanent Redirect",
"400": "Bad Request",
"401": "Unauthorized",
"402": "Payment Required",
"403": "Forbidden",
"404": "Not Found",
"405": "Method Not Allowed",
"406": "Not Acceptable",
"407": "Proxy Authentication Required",
"408": "Request Timeout",
"409": "Conflict",
"410": "Gone",
"411": "Length Required",
"412": "Precondition Failed",
"413": "Payload Too Large",
"414": "URI Too Long",
"415": "Unsupported Media Type",
"416": "Range Not Satisfiable",
"417": "Expectation Failed",
"418": "I'm a teapot",
"421": "Misdirected Request",
"422": "Unprocessable Entity",
"423": "Locked",
"424": "Failed Dependency",
"425": "Too Early",
"426": "Upgrade Required",
"428": "Precondition Required",
"429": "Too Many Requests",
"431": "Request Header Fields Too Large",
"451": "Unavailable For Legal Reasons",
"500": "Internal Server Error",
"501": "Not Implemented",
"502": "Bad Gateway",
"503": "Service Unavailable",
"504": "Gateway Timeout",
"505": "HTTP Version Not Supported",
"506": "Variant Also Negotiates",
"507": "Insufficient Storage",
"508": "Loop Detected",
"510": "Not Extended",
"511": "Network Authentication Required",
}
print("-" * 40)
gc.collect()
end_mem = gc.mem_free()
print(f"After Script Config: {end_mem}")
print(f"Memory Used: {start_mem - end_mem}")
print("-" * 40)
gc.collect()
start_mem = gc.mem_free()
print(f"Prior to WiFi: {start_mem}")
wifi.radio.connect(ssid, password)
print("✅ Wifi!")
print("-" * 40)
gc.collect()
end_mem = gc.mem_free()
print(f"After Wifi: {end_mem}")
print(f"Memory Used: {start_mem - end_mem}")
print("-" * 40)
gc.collect()
start_mem = gc.mem_free()
print(f"Prior to Open-Sky Public API: {start_mem}")
print("-" * 40)
print(" | Attempting to GET OpenSky-Network Single Public Flight JSON!")
try:
opensky_response = requests.get(url=OPENSKY_SOURCE)
opensky_json = opensky_response.json()
except ConnectionError as e:
print("Connection Error:", e)
print("Retrying in 10 seconds")
print(" | ✅ OpenSky-Network Public JSON!")
# Current flight data for single callsign (right now)
osn_single_flight_data = opensky_json["states"]
if osn_single_flight_data is not None:
flight_transponder = opensky_json["states"][0][0]
print(f" | | Transponder: {flight_transponder}")
# Return Air Flight data if not on ground
on_ground = opensky_json["states"][0][8]
if on_ground is True:
print(f" | | On Ground: {on_ground}")
else:
altitude = opensky_json["states"][0][7]
print(f" | | Barometric Altitude: {altitude}")
else:
print("This flight has no active data or you're polling too fast.")
print("Public Limits: 10 second max poll & 400 weighted calls daily")
opensky_response.close()
print("✂️ Disconnected from OpenSky-Network API")
print("-" * 40)
gc.collect()
end_mem = gc.mem_free()
print(f"After Open-Sky: {end_mem}")
print(f"Memory Used: {start_mem - end_mem}")
print("-" * 40)
gc.collect()
start_mem = gc.mem_free()
print(f"Prior to HTTPBin.org: {start_mem}")
print("-" * 40)
# Define a custom header
HEADERS = {"user-agent": "blinka/1.0.0"}
print(f" | GET JSON: {JSON_GET_URL}")
httpbin_response = requests.get(JSON_GET_URL, headers=HEADERS)
json_data = httpbin_response.json()
HEADERS = json_data["headers"]
print(f" | User-Agent: {HEADERS['User-Agent']}")
# HTTP STATUS CODE TESTING
# https://developer.mozilla.org/en-US/docs/Web/HTTP/Status
STATUS_CODE = str(httpbin_response.status_code)
STATUS_DESCRIPTION = http_status_codes.get(STATUS_CODE, "Unknown Status Code")
print_http_status(STATUS_CODE, STATUS_DESCRIPTION)
httpbin_response.close()
print(f" | ✂️ Disconnected from {JSON_GET_URL}")
print(" | ")
print(f" | Status Code Test: {JSON_GET_URL}")
# Some return errors then confirm the error (that's a good thing)
# Demonstrates not all errors have the same behavior
# 300, 304, and 306 in particular
for codes in sorted(http_status_codes.keys(), key=int):
status_test_url = STATUS_TEST + codes
httpbin_status_response = requests.get(status_test_url, headers=HEADERS)
SORT_STATUS_CODE = str(httpbin_status_response.status_code)
SORT_STATUS_DESC = http_status_codes.get(SORT_STATUS_CODE, "Unknown Status Code")
print_http_status(SORT_STATUS_CODE, SORT_STATUS_DESC)
httpbin_status_response.close()
print(f" | ✂️ Disconnected from {JSON_GET_URL}")
print("-" * 40)
gc.collect()
end_mem = gc.mem_free()
print(f"After HTTPBin.org: {end_mem}")
print(f"Memory Used: {start_mem - end_mem}")
print("-" * 40)
gc.collect()
start_mem = gc.mem_free()
print(f"Prior to RocketLaunch.Live: {start_mem}")
print("-" * 40)
print(" | Attempting to GET RocketLaunch.Live JSON!")
time.sleep(2)
debug_rocketlaunch_full_response = False
try:
rocketlaunch_response = requests.get(url=ROCKETLAUNCH_SOURCE)
rocketlaunch_json = rocketlaunch_response.json()
except ConnectionError as e:
print("Connection Error:", e)
print("Retrying in 10 seconds")
print(" | ✅ RocketLaunch.Live JSON!")
# JSON Endpoints
RLFN = str(rocketlaunch_json["result"][0]["name"])
RLDATE = str(rocketlaunch_json["result"][0]["date_str"])
# Print to serial & display label if endpoint not "None"
if RLDATE != "None":
print(f" | | Date: {RLDATE}")
if RLFN != "None":
print(f" | | Flight: {RLFN}")
rocketlaunch_response.close()
print("-" * 40)
gc.collect()
end_mem = gc.mem_free()
print(f"After RocketLaunch.Live: {end_mem}")
print(f"Memory Used: {start_mem - end_mem}")
print("-" * 40)
gc.collect()
end_mem = gc.mem_free()
print(f"Before Adafruit ShieldsIO: {end_mem}")
print(f"Memory Used: {start_mem - end_mem}")
print("-" * 40)
# Discord Adafruit Shields.io
try:
shieldsio_response = requests.get(url=ADA_DISCORD_JSON)
shieldsio_json = shieldsio_response.json()
except ConnectionError as e:
print(f"Connection Error: {e}")
print("Retrying in 10 seconds")
ada_users = shieldsio_json["value"]
ONLINE_STRING = " online"
REPLACE_WITH_NOTHING = ""
discord_users = ada_users.replace(ONLINE_STRING, REPLACE_WITH_NOTHING)
shieldsio_response.close()
print("-" * 40)
gc.collect()
end_mem = gc.mem_free()
print(f"After Adafruit ShieldsIO: {end_mem}")
print(f"Memory Used: {start_mem - end_mem}")
print("-" * 40)
print("✂️ Disconnected from all API's")
print("\nFinished!")
print(f"Board Uptime: {time_calc(time.monotonic())}")
print(f"Next Update: {time_calc(SLEEP_TIME)}")
print("===============================")
gc.collect()
time.sleep(SLEEP_TIME)
Serial Output:
code.py output:
----------------------------------------
Prior to Script Config: 2012320
----------------------------------------
After Script Config: 2019440
Memory Used: -7120
----------------------------------------
Prior to WiFi: 2019440
✅ Wifi!
----------------------------------------
After Wifi: 2019440
Memory Used: 0
----------------------------------------
Prior to Open-Sky Public API: 2019440
----------------------------------------
| Attempting to GET OpenSky-Network Single Public Flight JSON!
| ✅ OpenSky-Network Public JSON!
This flight has no active data or you're polling too fast.
Public Limits: 10 second max poll & 400 weighted calls daily
✂️ Disconnected from OpenSky-Network API
----------------------------------------
After Open-Sky: 2017680
Memory Used: 1760
----------------------------------------
Prior to HTTPBin.org: 2017680
----------------------------------------
| GET JSON: https://httpbin.org/get
| User-Agent: Adafruit CircuitPython,blinka/1.0.0
| ✅ Status Test: 200 - OK
| ✂️ Disconnected from https://httpbin.org/get
|
| Status Code Test: https://httpbin.org/get
| ✅ Status Test: 100 - Continue
| ✅ Status Test: 101 - Switching Protocols
| ✅ Status Test: 102 - Processing
| ✅ Status Test: 103 - Early Hints
| ✅ Status Test: 200 - OK
| ✅ Status Test: 201 - Created
| ✅ Status Test: 202 - Accepted
| ✅ Status Test: 203 - Non-Authoritative Information
| ✅ Status Test: 204 - No Content
| ✅ Status Test: 205 - Reset Content
| ✅ Status Test: 206 - Partial Content
| ✅ Status Test: 207 - Multi-Status
| ✅ Status Test: 208 - Already Reported
| ✅ Status Test: 226 - IM Used
| ❌ Status Test: 300 - Multiple Choices
| ✅ Status Test: 200 - OK
| ✅ Status Test: 200 - OK
| ✅ Status Test: 200 - OK
| ❌ Status Test: 304 - Not Modified
| ✅ Status Test: 200 - OK
| ❌ Status Test: 306 - Unused
| ✅ Status Test: 200 - OK
| ❌ Status Test: 308 - Permanent Redirect
| ❌ Status Test: 400 - Bad Request
| ❌ Status Test: 401 - Unauthorized
| ❌ Status Test: 402 - Payment Required
| ❌ Status Test: 403 - Forbidden
| ❌ Status Test: 404 - Not Found
| ❌ Status Test: 405 - Method Not Allowed
| ❌ Status Test: 406 - Not Acceptable
| ❌ Status Test: 407 - Proxy Authentication Required
| ❌ Status Test: 408 - Request Timeout
| ❌ Status Test: 409 - Conflict
| ❌ Status Test: 410 - Gone
| ❌ Status Test: 411 - Length Required
| ❌ Status Test: 412 - Precondition Failed
| ❌ Status Test: 413 - Payload Too Large
| ❌ Status Test: 414 - URI Too Long
| ❌ Status Test: 415 - Unsupported Media Type
| ❌ Status Test: 416 - Range Not Satisfiable
| ❌ Status Test: 417 - Expectation Failed
| ❌ Status Test: 418 - I'm a teapot
| ❌ Status Test: 421 - Misdirected Request
| ❌ Status Test: 422 - Unprocessable Entity
| ❌ Status Test: 423 - Locked
| ❌ Status Test: 424 - Failed Dependency
| ❌ Status Test: 425 - Too Early
| ❌ Status Test: 426 - Upgrade Required
| ❌ Status Test: 428 - Precondition Required
| ❌ Status Test: 429 - Too Many Requests
| ❌ Status Test: 431 - Request Header Fields Too Large
| ❌ Status Test: 451 - Unavailable For Legal Reasons
| ❌ Status Test: 500 - Internal Server Error
| ❌ Status Test: 501 - Not Implemented
| ❌ Status Test: 502 - Bad Gateway
| ❌ Status Test: 503 - Service Unavailable
| ❌ Status Test: 504 - Gateway Timeout
| ❌ Status Test: 505 - HTTP Version Not Supported
| ❌ Status Test: 506 - Variant Also Negotiates
| ❌ Status Test: 507 - Insufficient Storage
| ❌ Status Test: 508 - Loop Detected
| ❌ Status Test: 510 - Not Extended
| ❌ Status Test: 511 - Network Authentication Required
| ✂️ Disconnected from https://httpbin.org/get
----------------------------------------
After HTTPBin.org: 2015456
Memory Used: 2224
----------------------------------------
Prior to RocketLaunch.Live: 2015456
----------------------------------------
| Attempting to GET RocketLaunch.Live JSON!
Traceback (most recent call last):
File "adafruit_requests.py", line 515, in _get_socket
OSError: Failed SSL handshake
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "code.py", line 244, in <module>
File "adafruit_requests.py", line 711, in get
File "adafruit_requests.py", line 650, in request
File "adafruit_requests.py", line 496, in _get_socket
RuntimeError: Sending request failed
Code done running.
Press any key to enter the REPL. Use CTRL-D to reload.
memfree reports 2015456 available prior to the errors. Not getting Memory Errors here just Failed SSL handshake
and sending request failed
.
I've tested it with 1 URL and can hammer on it for hours with no errors. The errors happen when you attempt to use multiple connections to different servers. This test script is about as refined as I can make it. Next will test this same script on the S2 running 9.0
Adafruit CircuitPython 9.0.0 on 2024-03-19; Adafruit Feather ESP32S2 with ESP32S2
Board ID:adafruit_feather_esp32s2
code.py output:
----------------------------------------
Prior to Script Config: 2039824
----------------------------------------
After Script Config: 2038848
Memory Used: 976
----------------------------------------
Prior to WiFi: 2038848
✅ Wifi!
----------------------------------------
After Wifi: 2038848
Memory Used: 0
----------------------------------------
Prior to Open-Sky Public API: 2038848
----------------------------------------
| Attempting to GET OpenSky-Network Single Public Flight JSON!
| ✅ OpenSky-Network Public JSON!
This flight has no active data or you're polling too fast.
Public Limits: 10 second max poll & 400 weighted calls daily
✂️ Disconnected from OpenSky-Network API
----------------------------------------
After Open-Sky: 2034848
Memory Used: 4000
----------------------------------------
Prior to HTTPBin.org: 2034848
----------------------------------------
| GET JSON: https://httpbin.org/get
Traceback (most recent call last):
File "code.py", line 200, in <module>
File "/lib/adafruit_requests.py", line 715, in get
File "/lib/adafruit_requests.py", line 656, in request
File "/lib/adafruit_requests.py", line 520, in _get_socket
MemoryError:
Code done running.
Press any key to enter the REPL. Use CTRL-D to reload.
In 9.0 it fails much sooner, on the 2nd API, reliably, every time.
Here's another test script from issue 3651 that fails in an identical fashion. I tried that script and it fails exactly as described in that issue. There seems to be some kind of regression because apparently it was fixed and then it broke again sometime in 8.0 beta (best guess).
One second I have 2.1MB of memfree then immediately afterward on the next line 1 single call to requests.get throws a memory error. It doesn't seem like a lack of memory or at least not according to the behavior reported by memfree. It's not like I have 10KB free... I have around 2MB free when it crashes. The API call I'm making when it crashes most times should only take about 200 bytes...
memfree is telling you how much is available for MP heap. It isn't accounting for the IDF memory requirements. TLS related buffers aren't put in PSRAM so there is much less available.
Is there a way I can see how big the TLS buffer is? Track its size as the script goes? Why isn't that buffer reported in memfree if it's running out of buffer? Can we increase the buffer?
It is a function of the IDF settings. I believe it is lowered already. Some of it changed with the 4 -> 5 transition and I'm not sure there is a lot we can do. To make more space for it we'd need to allow TLS buffers on PSRAM or optimize our internal RAM use. It isn't a simple fix.
I didn't think it would be since this one seems to have continually affected the S2 since its debut. memfree reporting everything is good to crashing without warning is a bit jarring.
Having a TLS buffer error would be welcome so at least users will know it's TLS buffer related and not due to lack of PSRAM. After looking into it more yesterday and testing it in different ways all HTTP sites can be sequentially loaded without issue. It's specifically HTTPS responses that causes this one which points to a TLS issue but the error never says that. The general MemoryError had me thinking there was something physically wrong with the memory on my board at first. It's not a nice error to get. Requesting more descriptive error for TLS buffer limit reached (if possible).
Requesting more descriptive error for TLS buffer limit reached (if possible).
It isn't very easy to do because the underlying ESP_ERR_NO_MEM
is generated by the IDF. We don't know how big the allocation was or what it was for because we didn't attempt it ourselves. That's why it is espidf.MemoryError
vs the built in MemoryError
which usually includes the size that failed to allocate.
This issue is now fixed as of 9.0.4 stable with 9.x bundle as of adafruit-circuitpython-bundle-9.x-mpy-20240514
The test script here finished flawlessly first attempt.
It's also tracking memfree correctly.
----------------------------------------
Prior to Script Config: 2038016
----------------------------------------
After Script Config: 2038304
Memory Used: -288
----------------------------------------
Prior to WiFi: 2038304
✅ Wifi!
----------------------------------------
After Wifi: 2038304
Memory Used: 0
----------------------------------------
After Open-Sky: 2034400
Memory Used: 3904
----------------------------------------
After HTTPBin.org: 2032128
----------------------------------------
After RocketLaunch.Live: 2027152
Memory Used: 4976
----------------------------------------
After Adafruit ShieldsIO: 2024704
Memory Used: 7424
----------------------------------------
✂️ Disconnected from all API's
Fantastic! I'll close.
CircuitPython version
Code/REPL
Behavior
Description
Failed in 9.0 so I got out another ESP32-S2 Feather and tried on 8.2.5 with these results. It seems like this might have been broken on the S2 for quite some time unreported?
This is in 8.2.5 but fails with similar error messages in 9.0. It works fine on the ESP32-S3 Feather on 9.0. Referenced issue: https://github.com/adafruit/Adafruit_CircuitPython_ConnectionManager/issues/6
I've been bouncing back and forth between the Adafruit ESP32-S2 & S3 feathers in both 9.0 and 8.2.5 trying to track this issue down. There's some kind of memory and socket issue here that is beyond my understanding.
DanH requested I slim down my 500 line script to barebones, this is the slimed down version. I've swapped out a lot of API's that required tokens to use public API's instead so the reviewer won't have to setup as many tokens and logins. Some will still be required with logins.
It always fails on lines with requests.get(url=source). I've spent a couple days tracking this thing down with no progress. Time to hand it over to the devs. Good luck.
Additional information
No response