Open anecdata opened 2 years ago
@anecdata - I was able to reproduce this issue; but what I saw on a network analyzer is that the mDNS queries were still being sent; but the mDNS server (which I had running on a separate ESP32) stopped responding. What are you using for a mDNS server? Also, did you run any Web Workflow activity during your testing? I did that and did cause a hard crash.
@DavePutz I'll go back and verify, but I believe I tested this with web workflow on and off. I may be misunderstanding something, but there is no mDNS server other than the mdns.Server(wifi.radio)
which is used to do the network queries. I have a number of other devices with web workflow running (and I had one with manual pre-web-workflow mDNS running), and they should all, in theory, show up in the .find
listing (and often do, until it the .find
starts coming back empty.
Other behaviors I see when scanning mDNS for _circuitpython
_tcp
are
OSErrorr: -2
, which is fatal and needs a resetConnectionError: No network with that ssid
, from which it never recovers and needs a resetThe amount of time or number of scans varies before an issue arises.
P.S. I've put connect()
in the loop so that before each scan, there is validation that the device is still connected to an AP / has an IPv4.
P.P.S. Yes, just ran the scan on a device that is not running web workflow, and it failed after several minutes with an OSError -2
. Another failed after several minutes with ConnectionError: No network with that ssid
. BTW, ConnectionError: No network with that ssid
is an exception I very rarely see other than this. I have other devices running in this area, one continually doing wifi scans for APs, and they can all connect and show good RSSI to the nearby APs.
Actions: look at ESP-IDF issues. Test with Pico W as well.
Running mDNS finder now on Pico W for comparison:
Adafruit CircuitPython 8.0.0-beta.4-68-g6e40949f6 on 2022-12-02; Raspberry Pi Pico W with rp2040
No crashes yet, but two differences in results:
raspberrypi
: found service 0x********
raspberrypi
find, no duplicates in espressif
Addendum: still going strong after running overnight (not surprising since the code / SDK are so different)
Do we think there should be identical duplicate behavior?
raspberrypi
: more on mDNS duplicates (etc.) in issue #7326
- there's a debug message remaining in
raspberrypi
:found service 0x********
I fixed this in #7445
I'm looking at the reliability issue now.
I tried to reproduce this on an ESP32-S3 USB OTG but after 30 minutes it was still finding the other CP device. Any idea how many total results you got before it crashed? Maybe we're leaking them. Would you mind testing with a DEBUG build to get the backtrace? Thanks!
I'd guess something on the order of half dozen results every 15 seconds batch in the loop, sometimes more, sometimes less.
I haven't done much with mDNS recently, but I didn't see it during testing of "Share the web workflow MDNS object with the user" and other recent mDNS changes. I'll try first to just set it up and see if it's still happening. If it is, I can queue it up after 7459, the ESP32-S2 safe mode issue.
FWIW my test ran two and a half hours and kept finding other devices.
I loaded up modified test code from above (mostly a more robust connect, and bumped the mDNS timeout to 10 seconds) onto an S2 TFT with Adafruit CircuitPython 8.0.0-beta.6-44-g936ecdd2b on 2023-01-18
:
import time
import traceback
import wifi
import mdns
from secrets import secrets
MDNSFINDTIMEOUT = 10
def connect():
while not wifi.radio.ipv4_address:
try:
wifi.radio.connect(secrets["ssid"], secrets["password"])
except ConnectionError as e:
traceback.print_exception(e, e, e.__traceback__)
time.sleep(1)
# time.sleep(0.100) # Pico W wifi.radio.ipv4_address can lag wifi.radio.connect by tens of ms
time.sleep(1) # ap_info takes a moment to be valid
rssi = None
if hasattr(wifi.radio, "ap_info") and wifi.radio.ap_info.rssi:
rssi = wifi.radio.ap_info.rssi
return wifi.radio.ipv4_address, rssi
time.sleep(2) # wait for serial
print(f"{'='*25}")
print(f"{time.monotonic_ns()} Starting mDNS server")
m = mdns.Server(wifi.radio)
while True:
print(f"{time.monotonic_ns()} Finding mDNS hosts from {connect()}")
for service in m.find(service_type="_circuitpython", protocol="_tcp", timeout=MDNSFINDTIMEOUT):
print(f"{time.monotonic_ns()} {service.service_type} {service.protocol} {service.port} {service.hostname} {service.instance_name}")
time.sleep(15)
Also loaded up Adafruit CircuitPython 8.0.0-beta.6-44-g936ecdd2b on 2023-01-18
onto 4 QT Py S2 and 4 Pico W, web workflow enabled, no code.py
.
No safe mode observed, just mDNS quirks that seem a little beyond UDP unreliability:
.find
generally finds 0-3 devices when there are 10 or so circuitpython.local
s out there..local
address), but often are missing the "Device Info" and the "Here are other CircuitPython devices on your network:", and when there are other devices shown they're mostly (always?) S2 and not PicoW - maybe relevant to #7346 So I think we can close this issue? If safe mode or extended no-results arise again, this or a new issue can be opened. And leave the quirks and platform differences to future testing.
The code did eventually start looping with
Traceback (most recent call last):
File "code.py", line 12, in connect
ConnectionError: No network with that ssid
So maybe something is getting messed up in wifi-land, but it's recoverable with a reload.
So I think we can close this issue? If safe mode or extended no-results arise again, this or a new issue can be opened. And leave the quirks and platform differences to future testing.
I'm not sure we need to close, just re-milestone it.
I am out of stamina for debugging MDNS for the time being.
Good plan.
CircuitPython version
Code/REPL
Behavior
Loop will display findings for 5-10 minutes, or an hour. Then finding no results in all subsequent loops, despite still being connected to wifi. Control-C exits, but Control-D to reload either runs and still finds no hosts, or triggers a hard fault. Sometimes it will hard fault by itself after some iterations of finding no hosts.
Regression test with:
yields similar behavior.
Not sure if this is related to #6186.
Description
No response
Additional information
Optionally: add a
deinit
toServer
to allow user code to deinit / reinit the mDNS server to work around some issues.