georgerobotics / cyw43-driver

Other
76 stars 42 forks source link

Pico-W to Pico-W STALL while sending UDP #111

Open marcos-diaz opened 5 months ago

marcos-diaz commented 5 months ago

Steps to reproduce:

Expected results:

Actual results:

Special notes:

peterharperuk commented 5 months ago

You need to raise this against pico-sdk. See how it behaves after calling cyw43_wifi_pm with CYW43_PERFORMANCE_PM or even cyw43_pm_value(CYW43_NO_POWERSAVE_MODE, 20, 1, 1, 1)

marcos-diaz commented 5 months ago

Unfortunately power management does not help.

dpgeorge commented 5 months ago

Are you able to create a test using MicroPython for both sides, that shows the issue? I tried to do that but couldn't reproduce any issue. Here are my test scripts:

# Acting as the access point and server.

import network, socket

AP_SSID = "RPI_PICO_W"
PORT = 8000

def udp_server():
    s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
    s.bind(("0.0.0.0", PORT))
    try:
        while True:
            data, addr = s.recvfrom(1000)
            print("received", len(data), addr)
    except KeyboardInterrupt:
        pass
    s.close()
    print("done")

def test_server():
    wl = network.WLAN(1)
    wl.config(essid=AP_SSID, security=0)
    wl.active(1)
    print(f"MAC={wl.config('mac')} IP={wl.ifconfig()[0]}")
    print(f"AP advertising as {AP_SSID}")
    udp_server()
    print("AP shutting down")
    wl.active(0)
    wl.deinit()

test_server()

and

# Acting as the station and client.

import network, time, socket

AP_SSID = "RPI_PICO_W"
IP = "192.168.4.1"
PORT = 8000

def udp_client():
    ai = socket.getaddrinfo(IP, PORT)[0][-1]
    for i in range(2):
        s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
        print("create socket", s)
        for i in range(1000):
            data = bytearray(i & 0xFF for i in range(548))
            s.sendto(data, ai)
        s.close()
    print("done")

def test_client():
    wl = network.WLAN()
    wl.active(1)
    print(f"STA connecting to {AP_SSID}")
    wl.connect(AP_SSID)
    while not wl.isconnected() and wl.status() > 0:
        time.sleep_ms(1)
    udp_client()
    print("STA disconnecting")
    wl.disconnect()
    wl.active(0)
    wl.deinit()

test_client()

Running both those scripts on Pico W boards, the client is able to send UDP packets without issue.

Note: to run this test, first install MicroPython from https://micropython.org/download/RPI_PICO_W/, then pip install mpremote, then run the server first using mpremote run udp_server.py, then client as mpremote run udp_client.py.

marcos-diaz commented 4 months ago

Update with my findings so far:

(1) I reproduced the issue with Micropython, but instead of going into stall, it throws:

Traceback (most recent call last):
  File "<stdin>", line 38, in <module>
  File "<stdin>", line 31, in test_client
  File "<stdin>", line 19, in udp_client
OSError: -1

...when moving or tilting one of the Picos. Which seems to be just the same behavior than case (3) (see below).


(2) The reason for the "driver getting unresponsive" (when using the C SDK) seems to be the hardcoded timeout at cyw43_ll.c#L661, which is hardcoded at 1 second. If the application is trying to send packets at a rate faster than than 1 second (and they fail the go through) the driver will get into a state in which each attempt to send takes longer to timeout than the rate of new packets to sent. Tweaking that hardcoded value (and CYW43_SDPCM_SEND_COMMON_WAIT) so the packets "fails faster" seems to solve the stall problem. Maybe the hardcoded value should be exposed as a configuration parameter?


(3) The underlying reason for the packets failing to go through (cyw43_send_ethernet returning ERR_IF, or Micropython OSError) seems to be that the wifi connection Pico-to-Pico is extremely weak and unstable, specially when one of the picos is being physically moved or rotated. If you are trying to reproduce this issue, just move or tilt one Picos around and cyw43_send_ethernet will start throwing ERR_IF, even if the Picos are 10 centimeters apart and in direct line of sight.


(4) It feels like the Pico-W is only using a fraction of the power it should be using when doing wifi (compared with BT). One would expect that a wifi connection is way stronger than a BT connection on the same hardware, but it seems like the opposite. None of my attempts to raise wifi power usage had any impact (setting PM and country):

cyw43_arch_init_with_country(CYW43_COUNTRY_GERMANY);
cyw43_arch_enable_sta_mode();
cyw43_wifi_pm(&cyw43_state, CYW43_NO_POWERSAVE_MODE, 20, 0, 0, 0));

...is there something else to do related to power management on wifi?


(5) The value for pm2_sleep_ret_ms seems to be contradictory / wrong / or confusing at least. The documentation states The maximum time to wait before going back to sleep, so one would expect:

higher value (2000 ms) => more time awake => more performance
lower value (20 ms) => less time awake => more power saving

...but the macro definions imply the opposite.


(6) The concept of automatically retrying/waiting to send packets at driver level seems weird to me, given that TCP has its own retry buffer, and UDP should be fire-and-forget. But I'm sure there is a good reason for it.


(7) Additional thing that I desperately tried:


Final thoughts: Given how ridiculously weak the connection of Pico-to-Pico wifi is (losing the connection at a few centimeters) compared to pico-to-pico bluetooth, it looks to me that the implementation of Pico-W wifi has some lose thread somewhere, either in the power management or in the implementation of the driver; or maybe I'm missing something important (I never rule out that I'm stupid).

I would be very thankful if you can take a look into this issue.

Additionally if you want to double check my (potentially stupid) code, this is the real world implementation on a low latency wireless videogame controller (Input Labs):

...in which latency is nice and jitter is low, except when it completely stalls / loses the connection.

Thanks!

peterharperuk commented 4 months ago

Please raise an issue in here https://github.com/raspberrypi/pico-sdk and I'll try and find time to reproduce your test. Just add a link to this. My only comment is to check if you're polling enough.