adafruit / Adafruit_CircuitPython_ESP32SPI

ESP32 as wifi with SPI interface
MIT License
102 stars 74 forks source link

hang during get #11

Closed jerryneedell closed 5 years ago

jerryneedell commented 5 years ago

Not much to go on, but I wanted to document this as well as I can: I was running the cheerlights demo overnight. In the morning I noticed it was not updating and appeared to just be hung. I entered control-C ather REPL and this is the traceback:

Fetching json from https://api.thingspeak.com/channels/1417/feeds.json?results=1
Traceback (most recent call last):
  File "code.py", line 51, in <module>
  File "code.py", line 41, in <module>
  File "adafruit_esp32spi/adafruit_esp32spi_wifimanager.py", line 114, in get
  File "adafruit_esp32spi/adafruit_esp32spi_requests.py", line 229, in get
  File "adafruit_esp32spi/adafruit_esp32spi_requests.py", line 215, in request
  File "adafruit_esp32spi/adafruit_esp32spi_requests.py", line 188, in request
  File "adafruit_esp32spi/adafruit_esp32spi_socket.py", line 88, in readline
  File "adafruit_esp32spi/adafruit_esp32spi.py", line 545, in socket_available
  File "adafruit_esp32spi/adafruit_esp32spi.py", line 299, in _send_command_get_response
  File "adafruit_esp32spi/adafruit_esp32spi.py", line 288, in _wait_response_cmd
  File "adafruit_esp32spi/adafruit_esp32spi.py", line 285, in _wait_response_cmd
KeyboardInterrupt:

It appears to have been stuck in _waot_response_cmd()

That's all the information I have at this point.

anecdata commented 5 years ago

I see a similar behavior. On PyPortal running with >100k free memory, I have a wifimanager .get that fails routinely after some time (always seems to be during the 2nd .get in a sequence of 2 - I'm just pulling short pieces of text from an API), requiring manual reload:

Traceback (most recent call last):
  File "main.py", line 192, in <module>
  File "adafruit_esp32spi/adafruit_esp32spi_wifimanager.py", line 110, in get
  File "adafruit_esp32spi/adafruit_esp32spi_requests.py", line 235, in get
  File "adafruit_esp32spi/adafruit_esp32spi_requests.py", line 221, in request
  File "adafruit_esp32spi/adafruit_esp32spi_requests.py", line 165, in request
  File "adafruit_esp32spi/adafruit_esp32spi_socket.py", line 76, in connect
  File "adafruit_esp32spi/adafruit_esp32spi.py", line 572, in socket_connect
  File "adafruit_esp32spi/adafruit_esp32spi.py", line 503, in socket_open
  File "adafruit_esp32spi/adafruit_esp32spi.py", line 299, in _send_command_get_response
  File "adafruit_esp32spi/adafruit_esp32spi.py", line 261, in _wait_response_cmd
  File "adafruit_esp32spi/adafruit_esp32spi.py", line 173, in _wait_for_ready
RuntimeError: ESP32 not responding

If I can get it reproducible with a minimal demo, I'll post that.

ladyada commented 5 years ago

@jerryneedell can you add a timeout? errors are fine - ESP32 sometimes goes off into a bad zone - but it should raise an error rather than hangning

jerryneedell commented 5 years ago

I'll take a look at it -- timeout sounds like a good idea.

jerryneedell commented 5 years ago

Still looking into just where this is hanging and where a timeout should be added. It takes a long time to reproduce ;-)

anecdata commented 5 years ago

Only within the past week I've been getting hangs sometimes overnight on PyPortal (since updating to 4.0.0-beta6 or -beta7). esp.firmware_version reports 1.2.2 (NINA_W102_17_FEB_2018.bin I believe?) The following is after a run with 1,020 successful GETs (HTTP OK code 200), 8 408s (server timed out waiting for client - I've gotten some 408s regardless of version), and sporadic retries:

Traceback (most recent call last):
  File "main.py", line 315, in <module>
  File "main.py", line 270, in <module>
  File "adafruit_esp32spi/adafruit_esp32spi_wifimanager.py", line 110, in get
  File "adafruit_esp32spi/adafruit_esp32spi_requests.py", line 235, in get
  File "adafruit_esp32spi/adafruit_esp32spi_requests.py", line 221, in request
  File "adafruit_esp32spi/adafruit_esp32spi_requests.py", line 194, in request
  File "adafruit_esp32spi/adafruit_esp32spi_socket.py", line 90, in readline
  File "adafruit_esp32spi/adafruit_esp32spi.py", line 545, in socket_available
  File "adafruit_esp32spi/adafruit_esp32spi.py", line 299, in _send_command_get_response
  File "adafruit_esp32spi/adafruit_esp32spi.py", line 261, in _wait_response_cmd
  File "adafruit_esp32spi/adafruit_esp32spi.py", line 167, in _wait_for_ready
KeyboardInterrupt: 

I'll try to capture with debug enabled, and see if I can isolate anything.

ladyada commented 5 years ago

the thing i dont get is that we have a timeout in taht code

    def _wait_for_ready(self):
        """Wait until the ready pin goes low"""
        if self._debug >= 3:
            print("Wait for ESP32 ready", end='')
        times = time.monotonic()
        while (time.monotonic() - times) < 10:  # wait up to 10 seconds
            if not self._ready.value: # we're ready!
                break
            if self._debug >= 3:
                print('.', end='')
                time.sleep(0.05)
        else:
            raise RuntimeError("ESP32 not responding")

which makes me wonder if eventually time.monotonic() stops working! can you set _debug to 3 and turn on REPL output so we can tell where it might be failing/hanging?

anecdata commented 5 years ago

Done: debug=3 & I'm saving a large scrollback. I don't have the output anymore on the original hang, but if I recall correctly the raw time.montonic was approx. 40,000+ seconds, and the program run time was approx. 20,000+ seconds. (The PyPortal code does also occasionally restart itself when I'm not looking, but I assume that's unrelated.)

anecdata commented 5 years ago

Oddly, now I'm routinely getting memory errors I haven't seen before (no other code changes, other than debug): .Parameter #0 length is 171 Traceback (most recent call last): File "main.py", line 316, in <module> File "main.py", line 271, in <module> File "adafruit_esp32spi/adafruit_esp32spi_wifimanager.py", line 110, in get File "adafruit_esp32spi/adafruit_esp32spi_requests.py", line 235, in get File "adafruit_esp32spi/adafruit_esp32spi_requests.py", line 221, in request File "adafruit_esp32spi/adafruit_esp32spi_requests.py", line 194, in request File "adafruit_esp32spi/adafruit_esp32spi_socket.py", line 92, in readline File "adafruit_esp32spi/adafruit_esp32spi.py", line 559, in socket_read File "adafruit_esp32spi/adafruit_esp32spi.py", line 299, in _send_command_get_response File "adafruit_esp32spi/adafruit_esp32spi.py", line 288, in _wait_response_cmd File "adafruit_esp32spi/adafruit_esp32spi.py", line 286, in _wait_response_cmd File "adafruit_esp32spi/adafruit_esp32spi.py", line 240, in _read_bytes File "adafruit_esp32spi/adafruit_esp32spi.py", line 240, in <listcomp> MemoryError: memory allocation failed, allocating 1024 bytes Which is odd since the available memory prior to the last HTTP call was 63792b Something due to debug?? I don't think I'll catch the hang if I have to restart often.

ADDENDUM: Ignore this comment for now, doing more testing. esp._debug=3 seems to trigger a lot of HTTP 408 errors (presumably slowing down the client). That and/or serial may be factors in the memory issue. Trying esp._debug=2, looking better so far.

ADDENDUM 2: Ran for days w/o incident, likely a fluke that it happened a couple of times close together. I have identical code now running on PyPortal, Feather M4, and ItsyBitsy M4, so will keep an eye out.

ZachNo commented 5 years ago

I fixed it hanging on my end by adding the timeout in readline.

It appeared to hang in the while b'\r\n' not in self._buffer: while loop because the value returned from socket_available is always zero. (Most likely a transient network error not sending a response?)

After adding a max tries to that while loop, and throwing an exception after that threshold my PyPortal now seems to run perfectly.

ladyada commented 5 years ago

@ZachNo more timeouts are better, if you have a PR for a timeout addition please submit!

caternuson commented 5 years ago

Was #44 suppose to fix this? Looks like there is some history here. I've only just recently run into this. It appears that only fixed read, not readline? I'm having issue with getting stuck here also: https://github.com/adafruit/Adafruit_CircuitPython_ESP32SPI/blob/master/adafruit_esp32spi/adafruit_esp32spi_socket.py#L90 when the response is empty.

caternuson commented 5 years ago

I've been running with the PR44 version of the lib and it seems to have fixed this.

@jerryneedell Have you had a chance to test as well?

ladyada commented 5 years ago

@caternuson maybe we should do a release? a couple of these were fixed

jerryneedell commented 5 years ago

I did try #44 and had no problems with it.

caternuson commented 5 years ago

@ladyada yep, i'd say so. @jerryneedell do you think this issue is resolved?

jerryneedell commented 5 years ago

@caternuson Yes, I think so. If it recurs we can always file a new issue.

caternuson commented 5 years ago

cool. thanks. or reopen this one, closing for now.