Open mMerlin opened 1 month ago
I think I can help with logging inside the library, I authored most of the code.
Although it seems to me like the issue might be in CircuitPython itself rather than the library.
This is almost certainly a core CircuitPython. I am transferring this issue to the circuitpython repo.
Can I help tracking it down? I don't know what it will take to reliably reproduce the failure. I'm a programmer, but not really set up to do debug builds. Main environment is Fedora. I can spin up a VM if other environment (not Windows or Mac) is better for building.
Any idea how long the watchdog timer is? Knowing that I would have an idea about how long to wait before checking if it had triggered. Assuming that it needs 'idle' time to trigger it.
The "internal watchdog timer" that was triggered happens because something was taking far too long, blocking other operations (for example, a tight infinite loop). It reflects a bug, not just a timeout that's too short.
If you can reduce your program to the smallest example that still shows the problem, that would be great. We'll have to let it sit for a while to trigger the error, it seems like.
I did some more testing. I have sample code that reliably triggers a non-responsive httpserver, and most of the time safe mode on restart. My initial pointer to .poll() is back in the mix. I tried a couple of variations. When I use server.server_forever(…), I do not see the failures. Change that do start plus while True and server.poll() triggers the problem. Idle time is also part of the mix. If I do a fresh request every 5 minutes, the error does not trigger … until I leave it idle for about 15 minutes.
# SPDX-FileCopyrightText: 2024 H Phil Duby
# SPDX-License-Identifier: MIT
import sys
from time import monotonic_ns
import socketpool
import wifi
from adafruit_httpserver import Server, Request, Response
pool = socketpool.SocketPool(wifi.radio)
server = Server(pool, "/static", debug=True)
@server.route("/")
def base(request: Request):
request_ref = monotonic_ns()
print(f'{request_ref}: request for page {TAG}') # LOG
content = f"""<!DOCTYPE html>
<html><head><title>Watchdog {TAG}</title></head>
<body><p>{TAG} {request_ref}: <span id="datetime"></span></p>
<script>var date = new Date();document.getElementById("datetime").innerHTML = date;
</script></body></html>
"""
return Response(request, content, content_type="text/html")
TAG = 'simple debug 002'
print(f'{monotonic_ns()}: Starting page {TAG}') # LOG
print(f'{sys.implementation.name} {sys.implementation.version}\n{sys.implementation._machine}')
server.start(str(wifi.radio.ipv4_address))
while True:
server.poll()
Smaller code fails, but this keeps enough instrumentation/logging to match events (when it is still working). log and browser capture using serve_forever
Auto-reload is on. Simply save files over USB to run them or enter REPL to disable.
main.py output:
264137084966: Starting page simple debug 001
circuitpython (9, 1, 1)
Adafruit-Qualia-S3-RGB666 with ESP32S3
Started development server on http://192.168.20.230:5000
277870208741: request for page simple debug 001
192.168.20.108 -- "GET /" 397 -- "200 OK" 340 -- 19ms
192.168.20.108 -- "GET /favicon.ico" 373 -- "404 Not Found" 131 -- 29ms
336848815920: request for page simple debug 001
192.168.20.108 -- "GET /" 448 -- "200 OK" 340 -- 269ms
1519142303476: request for page simple debug 001
192.168.20.108 -- "GET /" 397 -- "200 OK" 341 -- 21ms
2810475250253: request for page simple debug 001
192.168.20.108 -- "GET /" 397 -- "200 OK" 341 -- 25ms
2846935180672: request for page simple debug 001
192.168.20.108 -- "GET /" 397 -- "200 OK" 341 -- 20ms
2937651519777: request for page simple debug 001
192.168.20.108 -- "GET /" 397 -- "200 OK" 341 -- 306ms
Traceback (most recent call last):
File "main.py", line 28, in <module>
File "adafruit_httpserver/server.py", line 190, in serve_forever
KeyboardInterrupt:
Code done running.
Press any key to enter the REPL. Use CTRL-D to reload.
Adafruit CircuitPython 9.1.1 on 2024-07-22; Adafruit-Qualia-S3-RGB666 with ESP32S3
>>>
soft reboot
Auto-reload is on. Simply save files over USB to run them or enter REPL to disable.
main.py output:
3010372314456: Starting page simple debug 001
circuitpython (9, 1, 1)
Adafruit-Qualia-S3-RGB666 with ESP32S3
Started development server on http://192.168.20.230:5000
3022093505869: request for page simple debug 001
192.168.20.108 -- "GET /" 397 -- "200 OK" 341 -- 25ms
4375878356939: request for page simple debug 001
192.168.20.108 -- "GET /" 397 -- "200 OK" 341 -- 305ms
5794625152590: request for page simple debug 001
192.168.20.108 -- "GET /" 397 -- "200 OK" 341 -- 305ms
7036811981207: request for page simple debug 001
192.168.20.108 -- "GET /" 397 -- "200 OK" 341 -- 20ms
Traceback (most recent call last):
File "main.py", line 28, in <module>
File "adafruit_httpserver/server.py", line 190, in serve_forever
KeyboardInterrupt:
Code done running.
Adafruit CircuitPython 9.1.1 on 2024-07-22; Adafruit-Qualia-S3-RGB666 with ESP32S3
>>>
soft reboot
Auto-reload is on. Simply save files over USB to run them or enter REPL to disable.
main.py output:
8358093139653: Starting page simple debug 001
circuitpython (9, 1, 1)
Adafruit-Qualia-S3-RGB666 with ESP32S3
Started development server on http://192.168.20.230:5000
8413348510754: request for page simple debug 001
192.168.20.108 -- "GET /" 397 -- "200 OK" 341 -- 309ms
8447393127449: request for page simple debug 001
192.168.20.108 -- "GET /" 363 -- "200 OK" 341 -- 23ms
192.168.20.108 -- "GET /favicon.ico" 322 -- "404 Not Found" 131 -- 23ms
Code stopped by auto-reload. Reloading soon.
simple debug 001 277870208741: Sat Jul 27 2024 15:24:44 GMT-0600 (Mountain Daylight Saving Time)
simple debug 001 1519142303476: Sat Jul 27 2024 15:45:25 GMT-0600 (Mountain Daylight Saving Time)
Timeout, although serial output shows request being processed 16:07
server restart
simple debug 001 3022093505869: Sat Jul 27 2024 16:11:59 GMT-0600 (Mountain Daylight Saving Time)
simple debug 001 4375878356939: Sat Jul 27 2024 16:33:02 GMT-0600 (Mountain Daylight Saving Time)
simple debug 001 5794625152590: Sat Jul 27 2024 16:56:40 GMT-0600 (Mountain Daylight Saving Time)
simple debug 001 7036811981207: Sat Jul 27 2024 17:17:22 GMT-0600 (Mountain Daylight Saving Time)
Unable to connect: 17:38
different failure mode? browser side issue?
server restart had no error, but initial requests after restart very slow, timed out
simple debug 001 8447393127449: Sat Jul 27 2024 17:41:29 GMT-0600 (Mountain Daylight Saving Time)
log and browser capture using poll, 20 minute request interval
Auto-reload is on. Simply save files over USB to run them or enter REPL to disable.
main.py output:
8758731506348: Starting page simple debug 002
circuitpython (9, 1, 1)
Adafruit-Qualia-S3-RGB666 with ESP32S3
Started development server on http://192.168.20.230:5000
8769515350351: request for page simple debug 002
192.168.20.108 -- "GET /" 363 -- "200 OK" 341 -- 316ms
Traceback (most recent call last):
File "main.py", line 30, in <module>
File "adafruit_httpserver/server.py", line 432, in poll
KeyboardInterrupt:
Code done running.
Press any key to enter the REPL. Use CTRL-D to reload.
Adafruit CircuitPython 9.1.1 on 2024-07-22; Adafruit-Qualia-S3-RGB666 with ESP32S3
>>>
soft reboot
Auto-reload is on. Simply save files over USB to run them or enter REPL to disable.
main.py output:
10077489196790: Starting page simple debug 002
circuitpython (9, 1, 1)
Adafruit-Qualia-S3-RGB666 with ESP32S3
Started development server on http://192.168.20.230:5000
10089233673099: request for page simple debug 002
192.168.20.108 -- "GET /" 363 -- "200 OK" 342 -- 309ms
Adafruit CircuitPython 9.1.1 on 2024-07-22; Adafruit-Qualia-S3-RGB666 with ESP32S3
>>>
soft reboot
Auto-reload is off.
Running in safe mode! Not running saved code.
You are in safe mode because:
Internal watchdog timer expired.
Press reset to exit safe mode.
Press any key to enter the REPL. Use CTRL-D to reload.
simple debug 002 8769515350351: Sat Jul 27 2024 17:46:23 GMT-0600 (Mountain Daylight Saving Time)
unable to connect 18:07
restart server
simple debug 002 10089233673099: Sat Jul 27 2024 18:08:21 GMT-0600 (Mountain Daylight Saving Time)
unable to connect 18:30
safe mode, watchdog timer
log and browser capture using poll, «initial» 5 minute request interval
Auto-reload is on. Simply save files over USB to run them or enter REPL to disable.
main.py output:
4018117645270: Starting page simple debug 002
circuitpython (9, 1, 1)
Adafruit-Qualia-S3-RGB666 with ESP32S3
Started development server on http://192.168.20.230:5000
4025758972170: request for page simple debug 002
192.168.20.108 -- "GET /" 363 -- "200 OK" 341 -- 33ms
4373888305672: request for page simple debug 002
192.168.20.108 -- "GET /" 363 -- "200 OK" 341 -- 18ms
4678340362556: request for page simple debug 002
192.168.20.108 -- "GET /" 363 -- "200 OK" 341 -- 18ms
4972010681165: request for page simple debug 002
192.168.20.108 -- "GET /" 363 -- "200 OK" 341 -- 14ms
5277275360120: request for page simple debug 002
192.168.20.108 -- "GET /" 363 -- "200 OK" 341 -- 16ms
5571675323493: request for page simple debug 002
192.168.20.108 -- "GET /" 363 -- "200 OK" 341 -- 16ms
5895380767834: request for page simple debug 002
192.168.20.108 -- "GET /" 363 -- "200 OK" 341 -- 129ms
6184399719242: request for page simple debug 002
192.168.20.108 -- "GET /" 363 -- "200 OK" 341 -- 16ms
6488390472423: request for page simple debug 002
192.168.20.108 -- "GET /" 363 -- "200 OK" 341 -- 18ms
7697915405278: request for page simple debug 002
192.168.20.108 -- "GET /" 363 -- "200 OK" 341 -- 16ms
Adafruit CircuitPython 9.1.1 on 2024-07-22; Adafruit-Qualia-S3-RGB666 with ESP32S3
>>>
soft reboot
Auto-reload is off.
Running in safe mode! Not running saved code.
You are in safe mode because:
Internal watchdog timer expired.
Press reset to exit safe mode.
Press any key to enter the REPL. Use CTRL-D to reload.
simple debug 002 4025758972170: Sat Jul 27 2024 19:52:33 GMT-0600 (Mountain Daylight Saving Time)
simple debug 002 4373888305672: Sat Jul 27 2024 19:58:21 GMT-0600 (Mountain Daylight Saving Time)
simple debug 002 4678340362556: Sat Jul 27 2024 20:03:25 GMT-0600 (Mountain Daylight Saving Time)
simple debug 002 4972010681165: Sat Jul 27 2024 20:08:19 GMT-0600 (Mountain Daylight Saving Time)
simple debug 002 5277275360120: Sat Jul 27 2024 20:13:24 GMT-0600 (Mountain Daylight Saving Time)
simple debug 002 5571675323493: Sat Jul 27 2024 20:18:18 GMT-0600 (Mountain Daylight Saving Time)
simple debug 002 5895380767834: Sat Jul 27 2024 20:23:42 GMT-0600 (Mountain Daylight Saving Time)
simple debug 002 6184399719242: Sat Jul 27 2024 20:28:31 GMT-0600 (Mountain Daylight Saving Time)
simple debug 002 6488390472423: Sat Jul 27 2024 20:33:35 GMT-0600 (Mountain Daylight Saving Time)
simple debug 002 7697915405278: Sat Jul 27 2024 20:53:45 GMT-0600 (Mountain Daylight Saving Time)
22:20 Unable to connect
It appears that this is not going to affect the project that it was initially encountered in. The problem was seen when doing development and testing of separate sections. With everything (so far) merged together, there is no 'idle time' that is needed to trigger the failure. The project shows the sensor data both on a local display, and on web pages. It seems that the processing needed for the display portion is enough to prevent the watchdog timeout seen in server.poll(). Even with the web server totally idle (no active connections).
Which means that some sort of heartbeat may be a work around. That also could mean that figuring out just what that heartbeat needs could point to where the watchdog is being triggered.
Possibly related https://github.com/adafruit/circuitpython/issues/9428
Code from this issue also uses adafruit_httpserver
Reading over the code provided in that, this does look like exactly the same case: CP 9.1.1, ESP32-S3, server.poll() that is idle 'long enough', watchdog timer. Smaller code sample here, which could be shrunk further if logging not useful.
If the problem can be prevented by adding some code to .poll()
that heartbeats or restart the watchdog timer i can make a PR for adafruit_httpserver
, although I would like to know whether it is possible and if it is something that should be done there, because maybe that is not something that the library should do.
Here is a bit smaller sketch that can produce the same symptoms. Note that the first cycle had the server stop responding, but on control C abort and control D reload, it restarted successfully. That restart also did not stop responding after a 16 minute idle period, then a 22 minute idle period. It did not respond after a following 27 minute idle period, and the following forced abort and restart showed the watchdog timer expired problem. For most of my previous testing 15 minutes idle time was sufficient to trigger the watchdog timer.
import socketpool
import wifi
from adafruit_httpserver import Server, Request, Response
pool = socketpool.SocketPool(wifi.radio)
server = Server(pool, "/static", debug=True)
@server.route("/")
def base(request: Request):
content = """<!DOCTYPE html>
<html><head><title>Watchdog check</title></head>
<body><p><span id="datetime"></span></p>
<script>document.getElementById("datetime").innerHTML = (new Date()).toISOString();
</script></body></html>
"""
return Response(request, content, content_type="text/html")
server.start(str(wifi.radio.ipv4_address))
while True:
server.poll()
from the web browser:
2024-07-29T08:26:57.426Z
2024-07-29T08:27:36.756Z
08:44 Unable to connect
2024-07-29T08:45:30.908Z
2024-07-29T09:06:41.214Z
2024-07-29T09:06:41.214Z
2024-07-29T09:28:24.754Z
09:55 Unable to connect
serial output:
soft reboot
Auto-reload is on. Simply save files over USB to run them or enter REPL to disable.
main.py output:
Started development server on http://192.168.20.230:5000
192.168.20.108 -- "GET /" 363 -- "200 OK" 298 -- 31ms
192.168.20.108 -- "GET /" 363 -- "200 OK" 298 -- 16ms
Traceback (most recent call last):
File "main.py", line 20, in <module>
File "adafruit_httpserver/server.py", line 432, in poll
KeyboardInterrupt:
Code done running.
Adafruit CircuitPython 9.1.1 on 2024-07-22; Adafruit-Qualia-S3-RGB666 with ESP32S3
>>>
soft reboot
Auto-reload is on. Simply save files over USB to run them or enter REPL to disable.
main.py output:
Started development server on http://192.168.20.230:5000
192.168.20.108 -- "GET /" 363 -- "200 OK" 298 -- 16ms
192.168.20.108 -- "GET /client" 412 -- "404 Not Found" 126 -- 141ms
192.168.20.108 -- "GET /" 363 -- "200 OK" 298 -- 31ms
192.168.20.108 -- "GET /" 363 -- "200 OK" 298 -- 16ms
192.168.20.108 -- "GET /" 363 -- "200 OK" 298 -- 47ms
Adafruit CircuitPython 9.1.1 on 2024-07-22; Adafruit-Qualia-S3-RGB666 with ESP32S3
>>>
soft reboot
Auto-reload is off.
Running in safe mode! Not running saved code.
You are in safe mode because:
Internal watchdog timer expired.
Press reset to exit safe mode.
Press any key to enter the REPL. Use CTRL-D to reload.
@mMerlin I'm experiencing the same issue. ESP32-S3 Feather TFT REV, CP 9.1.1.. Basically just connecting to wifi in beginning of script and reading the voltage on A4 inside of a while loop. I go into safemode from the same watchdog timer expiration error after about 20min.
I reproduced this once on an Adafruit Feather ESP32-S3 TFT and 9.2.0-alpha.2351-7-g19e5cf3d8f using a very simple test program that eliminates adafruit_httpserver
:
import errno
import wifi
import socketpool
def wrap_accept(s):
try:
return s.accept()
except OSError as e:
if e.errno == errno.EAGAIN:
return None, ("", 0)
raise
socket = socketpool.SocketPool(wifi.radio)
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.listen(10)
s.setblocking(False)
while True:
s1, peerinfo = wrap_accept(s)
if s1 is not None:
print(peerinfo)
s1.close()
I ran this test without even being connected to wifi, and it's also a slightly dubious program because it did not bind
a specific port. I paste this into the repl using paste mode. I saw this safe-mode-reset after less than 5 minutes on the first run, but now I haven't seen it again for ~10 minutes:
>>>
paste mode; Ctrl-C to cancel, Ctrl-D to finish
=== import errno
=== import wifi
=== import socketpool
===
=== def wrap_accept(s):
=== try:
=== return s.accept()
=== except OSError as e:
=== if e.errno == errno.EAGAIN:
=== return None, ("", 0)
=== raise
===
=== socket = socketpool.SocketPool(wifi.radio)
=== s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
=== s.listen(10)
=== s.setblocking(False)
=== while True:
=== s1, peerinfo = wrap_accept(s)
=== if s1 is not None:
=== print(peerinfo)
=== s1.close()
===
[tio 10:32:38] Disconnected
[tio 10:32:42] Connected
Adafruit CircuitPython 9.2.0-alpha.2351-7-g19e5cf3d8f on 2024-08-28; Adafruit Feather ESP32-S3 TFT with ESP32S3
>>>
soft reboot
Auto-reload is off.
Running in safe mode! Not running saved code.
You are in safe mode because:
Internal watchdog timer expired.
Press reset to exit safe mode.
I'm a little conflicted about posting this info since I only saw a WDT reset once -- there may be some other hidden variable. but I think it's good info that MAYBE this simple script is actually a reproducer.
I did eventually get a 2nd crash with just about the same code. However, this time it was a Hard fault. That run took almost 1 hour to crash.
I didn't find any further useful information during today's debugging session, and I'm not planning to work further on this issue right now.
Moved to 9.x.x because we need a clearer reproducer to debug.
Adafruit CircuitPython 9.1.1 on 2024-07-22; Adafruit-Qualia-S3-RGB666 with ESP32S3 Board ID:adafruit_qualia_s3_rgb666
I have a script to explore working with SSE. It «seems to» work. I can connect and disconnect browser sessions. However, if I then just leave it running with no connections, and come back later (maybe an hour), the app is frozen. ctrl+c in the serial window aborts and gives:
The first part of that is the final log messages while it was running. This has occurred several times with different versions of the code. Including (I think) the httpserver_sse.py example in the repo that I started from. I don't know if it it total run time, or total idle time that triggers this.
Adafruit CircuitPython 9.1.1 on 2024-07-22; Adafruit-Qualia-S3-RGB666 with ESP32S3 Board ID:adafruit_qualia_s3_rgb666
Here is code that 'almost' matches what produced the above log. Almost, because the sequence has been: run the script, do some testing, then edit the offline version. In this case, the edit was cleanup to remove some debug messages, tweak the remaining, then start working on a more full html file. Got called away for awhile, when returned, found it locked up. again. Some of those lockup resulted in CIRCUITPY becoming read only. I had to do the file system erase to get it back.
When it locks up, the only thing it is doing is polling, because self.sse_response is empty.
The index.html file referenced is:
I'm not setup to create a debug version of circuitpython, but with a little guidance, I probably could add some logging (with timestamps) to a local copy of HTTPServer.Server