Closed anecdata closed 1 year ago
Web workflow uses 1 socket for mdns, 1 socket to accept connections with a buffer of 1, 1 "active" socket for web requests and 1 socket for the websocket connection. So, 4 or 5 depending on how the pending connection is handled.
On some URLs, with web workflow off, things appear to work. But with the added print-debugs to Requests to expose the pass
'd and retried exceptions, it looks like on _send_request
there is either OSError: [Errno 9] EBADF
(in my case, for an mDNS address) or OSError: 32
[EPIPE
?] (in my case, for a numeric local IPv4) on every other try, but the retry gets it, so nothing is reported with the standard library.
This happens in beta.4 and beta.5. So something has been lurking there that may be related to the web workflow beta.5 case.
Forgot to mention that with web workflow on raspberrypi beta.5, when using the above code, I've tried several URLs. With the Adafruit URL, I don't know what the server sees, but none are successful. When I hit a WAN Apache server I can access (using domain name), the server shows HTTP 200 status codes and occasionally some are successful, but usually error as described above. But a LAN Apache server (using IP address) shows HTTP 408 status codes (server timed out). So it looks like some level of connection and data transfer is made, but the reads tend to fail in most circumstances.
Did some testing on espressif
for comparison: none of the exceptions occur with beta.4 or beta.5 on QT Py ESP32-S2 or QT Py ESP32-S3, including none of the "hidden" exceptions pass
'd through Requests' retries.
However, with some URLs, the first try in the _send_request
Requests loop gets an empty response (status code 200 on the servers in all observable cases) but the retry is successful.
Addendum: In another espressif code context (QT Py ESP32-S2 running beta.5, requests timeout set to 5 seconds), the first request in _send_request
gets OSError: Unhandled ESP TLS error 78 0 8018 -78
every time, but the internal Requests retry is successful.
It does seem that there are several issues at play here, but the most immediate is raspberrypi
not playing well with beta.5 and web workflow.
I'm testing with
Adafruit CircuitPython 8.0.0-beta.5-11-gfc13fba6e5-dirty on 2022-12-14; Raspberry Pi Pico W with rp2040
this contains some fixes to make a wider range of socket name lookup failures throw the "gaierror" exception as intended, but should be functionally identical to main.
wifi workflow is enabled. However, I have not connected to the web workflow interface or messed with resolving the device's mdns name from other computers.
To exclude as many factors as possible, I did not use adafruit_requests, but just wrote low level tests of getaddrinfo and socket.connect. all code was run by import from code.py
, not sure why it would make a difference to have the import, it's just the way I quickly switch between pieces of test code.
import socketpool
import wifi
if wifi.radio.ipv4_address is None:
print("connecting to wifi")
wifi.radio.connect(os.getenv("WIFI_SSID"), os.getenv('WIFI_PASSWORD'))
print(f"local address {wifi.radio.ipv4_address}")
socket = socketpool.SocketPool(wifi.radio)
host_names = ['eric.local', 'rat.local', 'jo.local', 'example.com', 'unpythonic.net', 'github.com']
def cycle(*args):
while True:
for a in args:
yield from a
print("Name resolution test")
success = failure = 0
for i, h in zip(range(100), cycle(host_names)):
try:
info = socket.getaddrinfo(h, 80)
print(end='.')
success += 1
except Exception as e:
print(f"{type(e)}: {e} {getattr(e, 'errno')}")
failure += 1
print(f"Over {success+failure} attempts, {success} success {failure} failure")
print()
print("Causing a failure, should give gaierror")
try:
print(socket.getaddrinfo('this-should.fail', 80))
except Exception as e:
print(f"{type(e)}: {e} {getattr(e, 'errno')}")
print("TCP connection test")
target = ('rat.u', 443) # a local computer running a web server
success = failure = 0
for i in range(100):
try:
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
s.connect(target)
print(end='.')
success += 1
except Exception as e:
print(f"{type(e)}: {e} {getattr(e, 'errno', None)}")
failure += 1
print(f"Over {success+failure} attempts, {success} success {failure} failure")
print()
print("SSL connection test")
time.sleep(.1)
ctx = ssl.create_default_context()
success = failure = 0
for i in range(10):
try:
with ctx.wrap_socket(socket.socket(socket.AF_INET, socket.SOCK_STREAM)) as s:
print(f"{s=}")
time.sleep(.1)
s.connect(target)
s.send('HEAD / HTTP/1.1\r\nHost: zaphod.unpythonic.net\r\n\r\n')
read_response(s)
print(end='.')
success += 1
except Exception as e:
print(f"{type(e)}: {e} {getattr(e, 'errno', None)}")
failure += 1
print()
print(f"Over {success+failure} attempts, {success} success {failure} failure")
Unfortunately, this doesn't reproduce any problems for me. Typical output:
Auto-reload is on. Simply save files over USB to run them or enter REPL to disable.
code.py output:
local address 10.0.2.94
Name resolution test
....................................................................................................
Over 100 attempts, 100 success 0 failure
Causing a failure, should give gaierror
socket_resolve_host() returned -2
<class 'gaierror'>: (-2, 'Name or service not known') -2
TCP connection test
....................................................................................................
Over 100 attempts, 100 success 0 failure
SSL connection test
s=<SSLSocket>
b'HTTP/1.1 200 OK\r\nDate: Wed, 14 Dec 2022 21:24:14'
b' GMT\r\nServer: Apache/2.4.54 (Debian)\r\nLast-Modif'
b'ied: Sat, 08 Jul 2017 15:52:04 GMT\r\nETag: "29cd-'
b'553d054d154ae"\r\nAccept-Ranges: bytes\r\nContent-Le'
b'ngth: 10701\r\nVary: Accept-Encoding\r\nContent-Type'
b': text/html\r\n\r\n'
…repetitions snipped…
Over 10 attempts, 10 success 0 failure
Code done running.
I tried running the original test script on the Pico W running the 12/8 beta.5 bits from s3 and after about 5 minutes I got the "OutOfRetries: Repeated socket failures" message.
I then flashed the latest bits from github and replaced my .env file with a settings.toml file and ran the test script which results in the success message being displayed repeatedly. I've run it now for several minutes without any failures.
This is probably expected behavior but I hadn't noticed it before. Prior to running this script the Pico W shows up in the web workflow list of "CircuitPython devices on your network" from another idle device, but while the script is running the device doesn't show up in the list.....
This is probably expected behavior but I hadn't noticed it before. Prior to running this script the Pico W shows up in the web workflow list of "CircuitPython devices on your network" from another idle device, but while the script is running the device doesn't show up in the list.....
No, I think it would be expected to still show up. @tannewt can you say for sure?
I'd expect it to show up. However, mdns does use udp so it may be missed occasionally due to that.
I did check it out on a Feather ESP32-S3 and didn't see the same behavior so I opened an issue up on it #7346
I just tested this with latest S3
Adafruit CircuitPython 8.0.0-beta.5-22-ge9f032f46 on 2022-12-20; Raspberry Pi Pico W with rp2040
and .env
converted to settings.toml
.
This works now with web workflow enabled. (Requests is still often getting an OSError: 32
on the first, hidden, try, but the internal retry is successful.)
Does anyone know what was broken in beta.5, and what fixed it?
Adafruit CircuitPython 8.0.0-rc.2 on 2023-02-02; Raspberry Pi Pico W with rp2040
Re-opening based on:
https://discord.com/channels/327254708534116352/537365702651150357/1071240927936512092
https://github.com/adafruit/Adafruit_CircuitPython_Requests/issues/126
https://discord.com/channels/327254708534116352/537365702651150357/1071516251693777009
It's very odd that when the issue occurs, it persists across microcontroller.reset()
and power cycles. Either something is affecting flash(?), or perhaps some non-volatile state in the wifi module(??).
I suspect there may be some marginal timing (perhaps variable) where (TLS?) sockets aren't ready at code start, and the perception of persistence across power / reset is just the code not working. But once it doesn't work, it will continue to not work until some delay is added and the device is reset.
This works after a reset and ongoing, but if STARTUP_WAIT = 0
it will fail with the OSError: (-29312, 'MBEDTLS_ERR_SSL_CONN_EOF')
.
import wifi
import time
import socketpool
import ssl
STARTUP_WAIT = 5
ITERATIONS = 10
HOST = "example.com"
PATH = "/"
PORT = 443
MAXBUF = 4096
time.sleep(STARTUP_WAIT) # wait for serial (and maybe wait for sockets)
print(f"{'='*25}")
print("Web Workflow enabled - already connected to AP")
pool = socketpool.SocketPool(wifi.radio)
for _ in range(ITERATIONS):
print(f"{'-'*25}\nCreate TCP Client Socket")
with pool.socket(pool.AF_INET, pool.SOCK_STREAM) as sock:
s = ssl.create_default_context().wrap_socket(sock, server_hostname=HOST)
print("Connecting")
s.connect((HOST, PORT))
size = s.send(f"HEAD {PATH} HTTP/1.1\r\nHost: {HOST}:{PORT}\r\n\r\n".encode())
print("Sent", size, "bytes")
# just get the first hunk and call it a day
buf = bytearray(MAXBUF)
size = s.recv_into(buf)
print('Received', size, "bytes", buf[:size])
Indeed, even @jepler's code behaves the same way if the TLS block is attempted first:
Addendum: The issue does not appear to be TLS-specific. The original example fails as well after a reset without a delay:
Perhaps related to #7313 (thanks, @Neradoc) - one second may not be enough.
@jepler Is there something we need to wait for when starting up the co-processor?
Sounds like same problem reported here: https://forums.adafruit.com/viewtopic.php?t=198486 (noted by mgmt). I haven't caught up on discord this weekend, so maybe this is all already known and linked.
hyde00001 confirmed on Discord just now that a delay at the start of the code works as above, once the board is then reset.
I have not seen a documented need for such a delay.
I do not understand how to check the latest reproducer script to see the resulting exception, since without a delay I won't be able to connect and see the printed exception.
What exception does the non-tls version produce?
No exception, it (non-tls) just times out with zero bytes received. Try your TLS loop but with no TLS right at the top of code, no delay, right after a reset. You'll miss the first prints, but subsequent loops will print.
This works with 5 seconds delay after reset, but not with 0 seconds delay after reset:
It may not be possible to reproduce it in MicroPython since it's really more like the non-web-workflow case in CircuitPython (where it's likely that the time to set up and connect to the AP is enough). This MicroPython code without a delay (rp2-pico-w-20230203-unstable-v1.19.1-852-g9ea64a36a.uf2
) works across resets:
I have not seen a documented need for such a delay.
Sorry - I missed that is with the web workflow only.
import wifi
has the side effect of calling common_hal_wifi_init()
. If you move the delay to before the import wifi
, does it affect what happens?
Interesting. I didn't expect that to behave the same, but it did. The issue appears more fundamental than wifi init.
This code works with a 5-second delay, but not with 0 seconds (again non-SSL version times out with 0 bytes received ; SSL version gets the (-29312, 'MBEDTLS_ERR_SSL_CONN_EOF')
exception):
I guess that makes sense though, even microcontroller.reset()
(which is how I typically test this) presumably uses some common hal or driver function to reset the rp2040, which also presumably causes some reset of the cyw43 module.
Clearly some timing issue here, though I cannot see where. Putting a "mp_hal_delay_ms(1500);" at the top of common_hal_socketpool_socket() in ports/raspberrypi/common-hal/socketpool/Socket.c makes everything work, but that's quite a hack.
Tested. Fixed by #7589. Thanks, @gneverov!
CircuitPython version
Code/REPL
Behavior
The above code, slightly modified boilerplate internet connect code, does not work on
raspberrypi
8.0.0-beta.5 when web workflow is enabled (wifi credentials in.env
file).Consistently gets exception:
It seems that the timeout in requests is kicking in within the library. No data is received. Occasionally
ETIMEDOUT
.Underlying exceptions to the
OutOfRetries
occur (discovered through some print-debugging inadafruit_requests
) if a manual timeout is supplied. Exceptions are initiallyOSError: [Errno 116] ETIMEDOUT
inrecv_into
, waiting for theH
ofHTTP
to kick off the receive. ThenOSError: 32
in_send
takes over on later requests. But again, sometimes there is no error, just no data received.I suspect some low-level socket shenanigans. There should be 8 sockets available, web workflow uses at least one for the TCP
listen
, probably something for mDNS, and more with client(s) accessing web workflow features?Interestingly If requests alternate between two URLs, after several failures for both, the second will succeed once. Rinse, repeat.
No issue with the above code on 8.0.0-beta.4, or on 8.0.0-beta.5 with web workflow disabled (no
.env
file).Description
No response
Additional information
No response