adafruit / Adafruit_CircuitPython_Requests

Requests-like interface for web interfacing
MIT License
51 stars 36 forks source link

Getting "Repeated socket failures" on PicoW #126

Closed matoushybl closed 1 year ago

matoushybl commented 1 year ago

Hi, I am trying to use this library on a Pico W with CircuitPython 8.0.0 beta 6 and I also tried nightly. The problem is that most of the time, I am getting "OutOfRetries: Repeated socket failures" when trying to GET basic URLs. The code is as follows:

import os
import time
import ssl
import wifi
import socketpool
import adafruit_requests

quotes_url = "http://wifitest.adafruit.com/testwifi/index.html"

wifi.radio.connect(os.getenv('CIRCUITPY_WIFI_SSID'), os.getenv('CIRCUITPY_WIFI_PASSWORD'))

pool = socketpool.SocketPool(wifi.radio)
requests = adafruit_requests.Session(pool, ssl.create_default_context())

print("Fetching text from %s" % quotes_url)
response = requests.get(quotes_url)
print("-" * 40)
print("Text Response: ", response.text)
print("-" * 40)
response.close()

The output is:

Fetching text from http://wifitest.adafruit.com/testwifi/index.html
Traceback (most recent call last):
  File "code.py", line 80, in <module>
  File "adafruit_requests.py", line 721, in get
  File "adafruit_requests.py", line 686, in request
OutOfRetries: Repeated socket failures

Randomly, the request is ok, but after rerunning the code it fails. Power cycling the board doesn't work and I've tried multiple APs.

Can you please give me any pointers on how to debug this problem? Thanks!

matoushybl commented 1 year ago

From some further debugging, it seems that it works before first power cycle is performed. I also tried MQTT and from the logs it seems like the Pico is capable of transmitting the data to the broker, it is not able to receive data and fails with the following message.

Traceback (most recent call last):
  File "code.py", line 103, in <module>
  File "adafruit_minimqtt/adafruit_minimqtt.py", line 530, in connect
MMQTTException: No data received from broker for 10 seconds.
device-cluster-mosquitto-1  | 1674650912: New client connected from 172.18.0.1:56462 as cpy35454 (p2, c1, k60).
device-cluster-mosquitto-1  | 1674650912: Client cpy35454 closed its connection.

I am starting to think, that this problem is connected to something in the wifi driver and I should probably move this issue to the CircuitPython repo.

anecdata commented 1 year ago

@matoushybl can you try this code, it bypasses requests (we can debug requests in detail depending how this comes out):

import os
import wifi
import socketpool
import ssl
import ipaddress
from secrets import secrets

HOST = "wifitest.adafruit.com"
PATH = "/testwifi/index.html"
PORT = 80
TIMEOUT = 60
MAXBUF = 1024

print("Connecting to wifi")
wifi.radio.connect(os.getenv('CIRCUITPY_WIFI_SSID'), os.getenv('CIRCUITPY_WIFI_PASSWORD'))

pool = socketpool.SocketPool(wifi.radio)

print("Create TCP Client Socket")
s = pool.socket(pool.AF_INET, pool.SOCK_STREAM)

s.settimeout(TIMEOUT)

print("Connecting")
s.connect((HOST, PORT))

size = s.send(f"GET {PATH} HTTP/1.1\r\nHost: {HOST}:{PORT}\r\n\r\n".encode())
print("Sent", size, "bytes")

# just get the first hunk and call it a day
buf = bytearray(MAXBUF)
size = s.recv_into(buf)
print('Received', size, "bytes", buf[:size])

s.close()
print("Done-ish.")
matoushybl commented 1 year ago

I'll do that. I made some progress yesterday when I came across a similar issue here: https://github.com/adafruit/circuitpython/issues/7333 . It seems like when I disable the web workflow by not specifying the SSID and password with the CIRCUITPY_ prefix, it all starts to work.

matoushybl commented 1 year ago

First running with CIRCUITPY_ prefix:

Connecting to wifi
Create TCP Client Socket
Connecting
Sent 69 bytes
Received 314 bytes bytearray(b'HTTP/1.1 200 OK\r\nServer: nginx/1.18.0 (Ubuntu)\r\nDate: Thu, 26 Jan 2023 15:04:29 GMT\r\nContent-Type: text/html\r\nContent-Length: 69\r\nLast-Modified: Thu, 09 Dec 2021 17:26:22 GMT\r\nConnection: keep-alive\r\nETag: "61b23c3e-45"\r\nAccept-Ranges: bytes\r\n\r\nThis is a test of Adafruit WiFi!\nIf you can read this, its working :)')
Done-ish.

After powerclycling:

Connecting to wifi
Create TCP Client Socket
Connecting
Sent 69 bytes
Traceback (most recent call last):
  File "code.py", line 31, in <module>
OSError: [Errno 116] ETIMEDOUT

Works flawlessy after removing the SSID and password prefix and power cycling.

anecdata commented 1 year ago

It sounds like you are not connected to wifi. The settings.toml file uses CIRCUITPY_WIFI_SSID and CIRCUITPY_WIFI_PASSWORD for web workflow. Leaving those out of the file, or commenting them. out, will disable web workflow.

You can use those names, or anything you like, for basic connection outside of web workflow. But in any case, the wifi.radio.connect() call must use credentials that are defined correctly somewhere.

You can verify that you are connected by printing or checking wifi.radio.ipv4_address in the code - it should be a valid IPv4 and not None or 0.0.0.0

matoushybl commented 1 year ago

I am pretty sure, I am connected to the wifi. I am sorry for misleading communication. Here is what is working for me, even across multiple reboots:

import os
import wifi
import socketpool
import ssl
import ipaddress

HOST = "wifitest.adafruit.com"
PATH = "/testwifi/index.html"
PORT = 80
TIMEOUT = 60
MAXBUF = 1024

print("Connecting to wifi")
wifi.radio.connect(os.getenv('WIFI_SSID'), os.getenv('WIFI_PASSWORD'))
print("My IP address is", wifi.radio.ipv4_address)

pool = socketpool.SocketPool(wifi.radio)

print("Create TCP Client Socket")
s = pool.socket(pool.AF_INET, pool.SOCK_STREAM)

s.settimeout(TIMEOUT)

print("Connecting")
s.connect((HOST, PORT))

size = s.send(f"GET {PATH} HTTP/1.1\r\nHost: {HOST}:{PORT}\r\n\r\n".encode())
print("Sent", size, "bytes")

# just get the first hunk and call it a day
buf = bytearray(MAXBUF)
size = s.recv_into(buf)
print('Received', size, "bytes", buf[:size])

s.close()
print("Done-ish.")

settings.toml:

WIFI_SSID = "SSID"
WIFI_PASSWORD = "PASSWORD"

output:

code.py output:
Connecting to wifi
My IP address is 10.15.0.27
Create TCP Client Socket
Connecting
Sent 69 bytes
Received 314 bytes bytearray(b'HTTP/1.1 200 OK\r\nServer: nginx/1.18.0 (Ubuntu)\r\nDate: Thu, 26 Jan 2023 21:29:07 GMT\r\nContent-Type: text/html\r\nContent-Length: 69\r\nLast-Modified: Thu, 09 Dec 2021 17:26:22 GMT\r\nConnection: keep-alive\r\nETag: "61b23c3e-45"\r\nAccept-Ranges: bytes\r\n\r\nThis is a test of Adafruit WiFi!\nIf you can read this, its working :)')
Done-ish.

Here is what doesn't work for me:

import os
import wifi
import socketpool
import ssl
import ipaddress

HOST = "wifitest.adafruit.com"
PATH = "/testwifi/index.html"
PORT = 80
TIMEOUT = 60
MAXBUF = 1024

print("Connecting to wifi")
wifi.radio.connect(os.getenv('CIRCUITPY_WIFI_SSID'), os.getenv('CIRCUITPY_WIFI_PASSWORD'))
print("My IP address is", wifi.radio.ipv4_address)

pool = socketpool.SocketPool(wifi.radio)

print("Create TCP Client Socket")
s = pool.socket(pool.AF_INET, pool.SOCK_STREAM)

s.settimeout(TIMEOUT)

print("Connecting")
s.connect((HOST, PORT))

size = s.send(f"GET {PATH} HTTP/1.1\r\nHost: {HOST}:{PORT}\r\n\r\n".encode())
print("Sent", size, "bytes")

# just get the first hunk and call it a day
buf = bytearray(MAXBUF)
size = s.recv_into(buf)
print('Received', size, "bytes", buf[:size])

s.close()
print("Done-ish.")

settings.toml:

CIRCUITPY_WIFI_SSID = "SSID"
CIRCUITPY_WIFI_PASSWORD = "PASSWORD"

First output is as follows - everything works:

Connecting to wifi
My IP address is 10.15.0.27
Create TCP Client Socket
Connecting
Sent 69 bytes
Received 314 bytes bytearray(b'HTTP/1.1 200 OK\r\nServer: nginx/1.18.0 (Ubuntu)\r\nDate: Thu, 26 Jan 2023 21:31:03 GMT\r\nContent-Type: text/html\r\nContent-Length: 69\r\nLast-Modified: Thu, 09 Dec 2021 17:26:22 GMT\r\nConnection: keep-alive\r\nETag: "61b23c3e-45"\r\nAccept-Ranges: bytes\r\n\r\nThis is a test of Adafruit WiFi!\nIf you can read this, its working :)')
Done-ish.

After power cycling the board:

Connecting to wifi
My IP address is 10.15.0.27
Create TCP Client Socket
Connecting
Sent 69 bytes
Traceback (most recent call last):
  File "code.py", line 32, in <module>
OSError: [Errno 116] ETIMEDOUT

This behavior is consistent over multiple reboots and runs of the code.

anecdata commented 1 year ago

When web workflow is on and it works the first time... what are the conditions preceding that (what had it been doing until that time? random?) vs. after a fresh power cycle? Is it the same if the board is reset using microcontroller.reset() or supervisor.reload() when web workflow is on? Does adding a delay early in the code do anything helpful? Reducing the timeout?

I haven't seen this behavior since a later beta-5, but I don't know what core change improved it at that time. But presumably some condition(s) in your environment might be bringing it out.

anecdata commented 1 year ago

BTW, I added some debugging to requests around line 650 or so in the current version:

        # We may fail to send the request if the socket we got is closed already. So, try a second
        # time in that case.
        retry_count = 0
        last_exc = None
        while retry_count < 2:
            retry_count += 1
            socket = self._get_socket(host, port, proto, timeout=timeout)
            print(f"#### DEBUG {time.monotonic():.3f} _get_socket {host} {port} {proto} {timeout}")
            ok = True
            try:
                self._send_request(socket, host, method, path, headers, data, json)
                print(f"#### DEBUG {time.monotonic():.3f} _send_request SENT {socket} {host} {method} {path} {headers}")  # {data} {json}
            except OSError as exc:
                print(f"#### DEBUG {time.monotonic():.3f} _send_request")
                traceback.print_exception(exc, exc, exc.__traceback__)
                ok = False
            if ok:
                # Read the H of "HTTP/1.1" to make sure the socket is alive. send can appear to work
                # even when the socket is closed.
                if hasattr(socket, "recv"):
                    result = socket.recv(1)
                else:
                    result = bytearray(1)
                    try:
                        socket.recv_into(result)
                    except OSError as exc:
                        print(f"#### DEBUG {time.monotonic():.3f} recv_into")
                        traceback.print_exception(exc, exc, exc.__traceback__)
                        pass
                if result == b"H":
                    # Things seem to be ok so break with socket set.
                    break
                else:
                    print(f"#### DEBUG {time.monotonic():.3f} result NOT == {b'H'}")
            else:
                print(f"#### DEBUG {time.monotonic():.3f} NOT OK")
            self._close_socket(socket)
            socket = None

        if not socket:
            raise OutOfRetries("Repeated socket failures") from last_exc

that gives a little more detail on the underlying errors (need to add imports for time and traceback)

anecdata commented 1 year ago

@matoushybl Can you try adding a delay like time.sleep(5) to the top of your code, before attempting any network operations?

matoushybl commented 1 year ago

I tried adding the delay, but it didn't change anything. I have also updated to CircuitPython 8.0.0.

anecdata commented 1 year ago

with the 5-second delay in, it doesn't work even after a power cycle or microcontroller.reset()?

(for example, web workflow enabled with settings.toml, this example code: https://github.com/adafruit/circuitpython/issues/7333#issuecomment-1419202562 then power cycle or reset)

matoushybl commented 1 year ago

That code works, interesting. I wasn't able to make it fail during a short testing.

anecdata commented 1 year ago

Good news. The key is that after a reset (reset button, power cycle, or microcontroller.reset()), there has to be enough of a delay before network operations to allow the Pico W's CYW43 wifi module to start up correctly. That means code with the delay inserted, or reset into the REPL and then by the time any code can be manually loaded or typed into the REPL the wifi module has had enough time to start up fully. If it doesn't start up correctly, no amount of code or time or soft reloads (like control-D) will fix it. The module only works correctly after a reset when the wifi module is allowed to start up fully.

Hopefully, this can be addressed in the core so code doesn't need these delays. There is an open issue for that (7333). But closing this library issue for now since we believe it's the same as the core issue, and the workaround is in place can be used for now. If you continue to experience this issue, please re-open.

dhalbert commented 1 year ago

@anecdata I'm confused. Are you saying that the delay solves the problem? But I would consider that a workaround, since we'd have to document that. We need some test that says the CYW43 is really ready, and then we can do that invisibly.

I wonder if there is some transaction going on with the access point that needs to finish, but the CYW43 is not revealing that it's not in a good state before that.

anecdata commented 1 year ago

I just thought this library issue was redundant with the core issue. Any real solution isn't in the library. We can re-open if that's more appropriate.

I don't know what aspect of the CYW43 startup is going bad (and staying bad) when not enough time is allowed.

dhalbert commented 1 year ago

ah sorry, I missed that this was the library issue

anecdata commented 1 year ago

@matoushybl This has been fixed with https://github.com/adafruit/circuitpython/pull/7589 and should be in the S3 Absolute Newest soon.

matoushybl commented 1 year ago

Thanks! I am looking forward to it.