adafruit / Adafruit_CircuitPython_ESP32SPI

ESP32 as wifi with SPI interface
MIT License
103 stars 75 forks source link

Socket and Timeout issues with requests - how to handle it? #208

Closed dandindarov closed 5 months ago

dandindarov commented 5 months ago

Hello, I have a MatrixPortal M4 with an esp32 getting some requests from the internet and drawing text and a bitmap graph based on the response onto a matrix led screen.

After an irregular amount of time it times out (sometimes 1 hour, 3 hours, longest was 7 hours) I'd like to know what I can do to try and reset and retry the connection when it does freeze. This is my code for the request:

def grab_data():
    retry_count = 5
    for attempt in range(retry_count):
        try:
            log_message("Calling grab data")
            print("ESP32 is connected to the INTERNET:", str(esp.is_connected))
            socket_status() # tells status of all sockets
            response = requests.get(url_list)
            data_website = response.json()
            response.close()  # Close response to free up memory
            del response
            gc.collect()  # Explicitly call garbage collection
            log_message("Returned grab data")
            socket_status()
            return data_website

        except Exception as e:
            log_message("ERROR!!!!!!! grabbing data (attempt {} of {})".format(attempt + 1, retry_count))
            traceback.print_exception(e)

            for x in range(8):
                if esp.socket_status(x) != 0:
                    log_message(f"closing socket number {x}")
                    esp.socket_close(x)
            esp.reset() # added reset
            esp.disconnect()
            print("disconnect from the internet")
            print("ESP32 is connected to the INTERNET:", str(esp.is_connected))
            while not esp.is_connected:
                try:
                    esp.connect_AP(secrets["ssid"], secrets["password"])
                except OSError as e:
                    print("could not connect to AP, retrying: ", e)
                    continue
            print("Connected to", str(esp.ssid, "utf-8"), "\tRSSI:", esp.rssi)
            print("My IP address is", esp.pretty_ip(esp.ip_address))

            time.sleep(10)  # Wait before retrying

These are the type of errors I get, the first error will always be some sort of time out:

Traceback (most recent call last):
  File "code.py", line 175, in grab_data
  File "adafruit_requests.py", line 591, in get
  File "adafruit_requests.py", line 537, in request
  File "adafruit_esp32spi/adafruit_esp32spi_socket.py", line 98, in recv
  File "adafruit_esp32spi/adafruit_esp32spi_socket.py", line 143, in recv_into
timeout: timed out

or

  File "code.py", line 175, in grab_data
  File "adafruit_requests.py", line 591, in get
  File "adafruit_requests.py", line 537, in request
  File "adafruit_esp32spi/adafruit_esp32spi_socket.py", line 98, in recv
  File "adafruit_esp32spi/adafruit_esp32spi_socket.py", line 129, in recv_into
  File "adafruit_esp32spi/adafruit_esp32spi_socket.py", line 155, in _available
  File "adafruit_esp32spi/adafruit_esp32spi.py", line 789, in socket_available
  File "adafruit_esp32spi/adafruit_esp32spi.py", line 341, in _send_command_get_response
  File "adafruit_esp32spi/adafruit_esp32spi.py", line 308, in _wait_response_cmd
  File "adafruit_esp32spi/adafruit_esp32spi.py", line 287, in _wait_spi_char
  TimeoutError: Timed out waiting for SPI char

After that you'll see my code attempts to reset connection and retry. It'll either get the same error again or it will give me this upon 2nd and further retries:

Traceback (most recent call last):
  File "code.py", line 175, in grab_data
  File "adafruit_requests.py", line 591, in get
  File "adafruit_requests.py", line 525, in request
  File "adafruit_connection_manager.py", line 226, in get_socket
RuntimeError: Socket already connected to http://pythonvfxer.pythonanywhere.com:80

I added some socket management to my code and I'm trying to close all connected sockets before it tries the loop again as you can see. This does nothing, on further retries it just keeps saying the socket is already connected, but when I query all sockets they all return 0.

Thanks in advance, I'm new to all of this and just tinkering, punching way above my weightclass I just want a way to reset all connections and sockets and let the esp32 try and connect again if it times out. I appreciate no network connection is always stable.

Running all the latest available as of today: Firmware vers. 1.7.7 Circuit python 9.0.5 I updated the adafruit_esp32spi library based on this github too.

anecdata commented 5 months ago

You've already tried esp.reset() and esp.disconnect(), so as a workaround for now, using adafruit_connection_manager, you can try: https://docs.circuitpython.org/projects/connectionmanager/en/latest/api.html#adafruit_connection_manager.connection_manager_close_all if that doesn't work, then you could escalate to supervisor.reload, then to microcontroller.reset().

/cc: @justmobilize This seems to be the same behavior I reported on Discord.

justmobilize commented 5 months ago

The 9.1 beta isn't required, but you would want to make sure the adafruit_esp32spi is in the root and not the lib folder (it's frozen, so it takes precedence over lib).

@anecdata I'm still trying to find a constant repro for this one...

anecdata commented 5 months ago

@justmobilize Thanks, I deleted that, after noticing OP was already using CM.

justmobilize commented 5 months ago

I would try something like this:

        try:
            ...
        except TimeoutError as e:
            print(e)
            adafruit_connection_manager.connection_manager_close_all()
        except Exception as e:
            ...

And see what happens when it goes through the next time

anecdata commented 5 months ago

connection_manager._free_sockets(force=True) ...what's the distinction with .close-all() (and should it be a (documented) public method)?

justmobilize commented 5 months ago

Yeah, so connection_manager_close_all would be fine too, as long as you don't pass in release_references. Sometimes I forget all the methods... ;)

Comment updated.

dandindarov commented 5 months ago

Thank you both!

if that doesn't work, then you could escalate to supervisor.reload, then to microcontroller.reset() Yes, I'm aware of this as a last resort, I just want the last displayed graph to stay on while it tries reconnecting in the background. So visually there is no breaks. But I will eventually set it to escalate to reload and reset before I finish.

I did actually stumble across connection_manager_close_all but couldn't get it to work previously, I attempted again and it also came back with an error saying it can't find it which is odd, ._free_sockets(force=True) only worked without the force. Updating to 9.1 beta fixed all this, both functions run fine now. Oh reading this back, I put adafruit_esp32spi in the root folder as instructed above, but not adafruit_connection_manager so maybe that's why it didn't work in 9.0.5

On first look at the code it seems to not do much more than my brute close any sockets loop I had, but removed my loop and ran this close_all with the except TimeoutError as e: setup exactly as @justmobilize above

And of course, it came back with a new error now that isn't Timeout so the close_all didn't get a chance to action:

Traceback (most recent call last):
  File "code.py", line 177, in grab_data
  File "adafruit_requests.py", line 683, in get
  File "adafruit_requests.py", line 629, in request
  File "adafruit_esp32spi/adafruit_esp32spi_socketpool.py", line 140, in recv
  File "adafruit_esp32spi/adafruit_esp32spi_socketpool.py", line 185, in recv_into
OSError: [Errno 116] ETIMEDOUT

The following loops errored to the previous socket being already connected error, but again since it was an OSError and not caught by the Exception no sockets were closed.

Tried again, but closing sockets on any exceptions, this time it set a new record, a whole 9 hours without erroring. But it gave out a new fresh error:

Traceback (most recent call last):
  File "code.py", line 178, in grab_data
  File "adafruit_requests.py", line 324, in json
ValueError: syntax error in JSON

And the following loops after it closed the socket with connection_manager_close_all:

Traceback (most recent call last):
  File "code.py", line 177, in grab_data
  File "adafruit_requests.py", line 683, in get
  File "adafruit_requests.py", line 600, in request
  File "adafruit_requests.py", line 239, in close
  File "adafruit_connection_manager.py", line 279, in free_socket
RuntimeError: Socket not managed

9.1 does seem more stable though. Do you think getting the response request as .json is a problem? All these 3 errors are first times now. This makes me believe the server I'm querying from is part of the problem, but even it has a hiccup I'd expect it to be fine after a few retries.

I'm going to re-write the try/excepts now to catch these individual errors, maybe if it fails getting .json for some reason it will attempt a .text and then convert it to a dict

Just odd as it ran 9 hours without fail though... Hard to debug this stuff

justmobilize commented 5 months ago

I highly recommend using:

with requests.get(...) as response:
    data = response.json()

This will help make sure the sockets get closed (and you don't need to).

It does get harder when you have an exception, since things don't always clean themselves up.

If you can reduce your code to something that's reproducible, then we can figure out how to fix it.

CPython is way easier. You can make new sockets all day long...

dandindarov commented 5 months ago

Thanks! That makes more sense yes, cleaner than my response.close(). Whatever you guys did in 9.1 seems way more stable, further testing needed but I was averaging maybe around 1-2 hours between timeouts/crashes on 9.0.5, now it's like 10-12 hours,

I think at this point happy to just escalate it to a reload/microcontroller reset, as resetting a couple of times a day shouldn't be an issue.

After I finish this project I can try and do a simple version of it to try and make the problem more reproducible for sure. I've learnt that the drawing of bitmaps doesn't help at the beginning of my debugging journey, I was getting more timeouts when I was drawing it every 10 seconds, after drawing it every 60 seconds it crashed way less. I feel like I'm just pushing it too far maybe. I can maybe code something up for you where it does requests + goes heavy on drawing on the screen so it's quite time out prone.

Closing this for now, thanks for all the help! And for your amazing code and libraries everyone. It's been so fun to come up with projects and prototypes.