Closed jerryneedell closed 5 years ago
I see a similar behavior. On PyPortal running with >100k free memory, I have a wifimanager .get that fails routinely after some time (always seems to be during the 2nd .get in a sequence of 2 - I'm just pulling short pieces of text from an API), requiring manual reload:
Traceback (most recent call last):
File "main.py", line 192, in <module>
File "adafruit_esp32spi/adafruit_esp32spi_wifimanager.py", line 110, in get
File "adafruit_esp32spi/adafruit_esp32spi_requests.py", line 235, in get
File "adafruit_esp32spi/adafruit_esp32spi_requests.py", line 221, in request
File "adafruit_esp32spi/adafruit_esp32spi_requests.py", line 165, in request
File "adafruit_esp32spi/adafruit_esp32spi_socket.py", line 76, in connect
File "adafruit_esp32spi/adafruit_esp32spi.py", line 572, in socket_connect
File "adafruit_esp32spi/adafruit_esp32spi.py", line 503, in socket_open
File "adafruit_esp32spi/adafruit_esp32spi.py", line 299, in _send_command_get_response
File "adafruit_esp32spi/adafruit_esp32spi.py", line 261, in _wait_response_cmd
File "adafruit_esp32spi/adafruit_esp32spi.py", line 173, in _wait_for_ready
RuntimeError: ESP32 not responding
If I can get it reproducible with a minimal demo, I'll post that.
@jerryneedell can you add a timeout? errors are fine - ESP32 sometimes goes off into a bad zone - but it should raise an error rather than hangning
I'll take a look at it -- timeout sounds like a good idea.
Still looking into just where this is hanging and where a timeout should be added. It takes a long time to reproduce ;-)
Only within the past week I've been getting hangs sometimes overnight on PyPortal (since updating to 4.0.0-beta6 or -beta7). esp.firmware_version reports 1.2.2 (NINA_W102_17_FEB_2018.bin I believe?) The following is after a run with 1,020 successful GETs (HTTP OK code 200), 8 408s (server timed out waiting for client - I've gotten some 408s regardless of version), and sporadic retries:
Traceback (most recent call last):
File "main.py", line 315, in <module>
File "main.py", line 270, in <module>
File "adafruit_esp32spi/adafruit_esp32spi_wifimanager.py", line 110, in get
File "adafruit_esp32spi/adafruit_esp32spi_requests.py", line 235, in get
File "adafruit_esp32spi/adafruit_esp32spi_requests.py", line 221, in request
File "adafruit_esp32spi/adafruit_esp32spi_requests.py", line 194, in request
File "adafruit_esp32spi/adafruit_esp32spi_socket.py", line 90, in readline
File "adafruit_esp32spi/adafruit_esp32spi.py", line 545, in socket_available
File "adafruit_esp32spi/adafruit_esp32spi.py", line 299, in _send_command_get_response
File "adafruit_esp32spi/adafruit_esp32spi.py", line 261, in _wait_response_cmd
File "adafruit_esp32spi/adafruit_esp32spi.py", line 167, in _wait_for_ready
KeyboardInterrupt:
I'll try to capture with debug enabled, and see if I can isolate anything.
the thing i dont get is that we have a timeout in taht code
def _wait_for_ready(self):
"""Wait until the ready pin goes low"""
if self._debug >= 3:
print("Wait for ESP32 ready", end='')
times = time.monotonic()
while (time.monotonic() - times) < 10: # wait up to 10 seconds
if not self._ready.value: # we're ready!
break
if self._debug >= 3:
print('.', end='')
time.sleep(0.05)
else:
raise RuntimeError("ESP32 not responding")
which makes me wonder if eventually time.monotonic()
stops working! can you set _debug
to 3 and turn on REPL output so we can tell where it might be failing/hanging?
Done: debug=3 & I'm saving a large scrollback. I don't have the output anymore on the original hang, but if I recall correctly the raw time.montonic was approx. 40,000+ seconds, and the program run time was approx. 20,000+ seconds. (The PyPortal code does also occasionally restart itself when I'm not looking, but I assume that's unrelated.)
Oddly, now I'm routinely getting memory errors I haven't seen before (no other code changes, other than debug):
.Parameter #0 length is 171 Traceback (most recent call last): File "main.py", line 316, in <module> File "main.py", line 271, in <module> File "adafruit_esp32spi/adafruit_esp32spi_wifimanager.py", line 110, in get File "adafruit_esp32spi/adafruit_esp32spi_requests.py", line 235, in get File "adafruit_esp32spi/adafruit_esp32spi_requests.py", line 221, in request File "adafruit_esp32spi/adafruit_esp32spi_requests.py", line 194, in request File "adafruit_esp32spi/adafruit_esp32spi_socket.py", line 92, in readline File "adafruit_esp32spi/adafruit_esp32spi.py", line 559, in socket_read File "adafruit_esp32spi/adafruit_esp32spi.py", line 299, in _send_command_get_response File "adafruit_esp32spi/adafruit_esp32spi.py", line 288, in _wait_response_cmd File "adafruit_esp32spi/adafruit_esp32spi.py", line 286, in _wait_response_cmd File "adafruit_esp32spi/adafruit_esp32spi.py", line 240, in _read_bytes File "adafruit_esp32spi/adafruit_esp32spi.py", line 240, in <listcomp> MemoryError: memory allocation failed, allocating 1024 bytes
Which is odd since the available memory prior to the last HTTP call was 63792b
Something due to debug?? I don't think I'll catch the hang if I have to restart often.
ADDENDUM: Ignore this comment for now, doing more testing. esp._debug=3 seems to trigger a lot of HTTP 408 errors (presumably slowing down the client). That and/or serial may be factors in the memory issue. Trying esp._debug=2, looking better so far.
ADDENDUM 2: Ran for days w/o incident, likely a fluke that it happened a couple of times close together. I have identical code now running on PyPortal, Feather M4, and ItsyBitsy M4, so will keep an eye out.
I fixed it hanging on my end by adding the timeout in readline
.
It appeared to hang in the while b'\r\n' not in self._buffer:
while loop because the value returned from socket_available
is always zero. (Most likely a transient network error not sending a response?)
After adding a max tries to that while loop, and throwing an exception after that threshold my PyPortal now seems to run perfectly.
@ZachNo more timeouts are better, if you have a PR for a timeout addition please submit!
Was #44 suppose to fix this? Looks like there is some history here. I've only just recently run into this. It appears that only fixed read
, not readline
? I'm having issue with getting stuck here also:
https://github.com/adafruit/Adafruit_CircuitPython_ESP32SPI/blob/master/adafruit_esp32spi/adafruit_esp32spi_socket.py#L90
when the response is empty.
I've been running with the PR44 version of the lib and it seems to have fixed this.
@jerryneedell Have you had a chance to test as well?
@caternuson maybe we should do a release? a couple of these were fixed
I did try #44 and had no problems with it.
@ladyada yep, i'd say so. @jerryneedell do you think this issue is resolved?
@caternuson Yes, I think so. If it recurs we can always file a new issue.
cool. thanks. or reopen this one, closing for now.
Not much to go on, but I wanted to document this as well as I can: I was running the cheerlights demo overnight. In the morning I noticed it was not updating and appeared to just be hung. I entered control-C ather REPL and this is the traceback:
It appears to have been stuck in _waot_response_cmd()
That's all the information I have at this point.