espressif / esp-idf

Espressif IoT Development Framework. Official development framework for Espressif SoCs.
Apache License 2.0
13.37k stars 7.21k forks source link

ESP32-C6 - WiFi 6 - Target Wait Time (TWT) - Teardown function causes intermitent connection timeouts (IDFGH-12310) #13344

Open djangoa opened 6 months ago

djangoa commented 6 months ago

Answers checklist.

IDF version.

5.1.3

Espressif SoC revision.

ESP32-C6-FH4

Operating System used.

Windows

How did you build your project?

Eclipse IDE

If you are using Windows, please specify command line type.

None

Development Kit.

DFROBOT FireBeetle 2 ESP32 C6 IoT Development Board

Power Supply used.

USB

What is the expected behavior?

Calling _esp_wifi_sta_itwt_teardown(flowid) tears down the selected Targeted Wake Time and allows network traffic to flow freely through the wifi interface on receiving a WIFI_EVENT_ITWT_TEARDOWN event.

What is the actual behavior?

Calling _esp_wifi_sta_itwt_teardown(flowid) intermittently causes select() timeouts when creating new connections.

See the attached DEBUG Logs of an application I've been working on that integrates the itwt and mqtt tcp examples and sits in a loop publishing data from a sensor to a MQTT broker then sleeping for 5minutes.

In the Debug Logs:

  1. The first iteration works without issue.
  2. The 2nd fails to connect to the broker after 3 attempts over ~600ms after tearing down the TWT.
  3. Finally, the last iteration connects to the broker on the 2nd attempt.

Steps to reproduce.

Setup and start wifi. Create a non trivial TWT agreement with an AP using a service period of around 5mins (e.g. 46875 << 13) , then in a loop:

  1. Call __esp_wifi_sta_itwtteardown()
  2. Create a new TCP connection (e.g. using mqtt_client to publish some data)
  3. Intermittently receive connection timeout (e.g. ESP_ERR_ESP_TLS_CONNECTION_TIMEOUT)
  4. Try recovering by creating a new connection after a failure but often continue to receive a timeout.
  5. Give up and create a new TWT agreement via a call to _esp_wifi_sta_itwtsetup()
  6. vTaskDelay() for a period of time, invoking automatic light sleep to minimise power consumption
  7. GOTO step 1, restart the loop and hope the next iteration is more successful / wastes less time & power being awake.

Debug Logs.

D (37) app_main: Main task executing [boot:33203µs]
V (49) app_main: Wireless executing
I (55) wireless: Set static ip: 192.168.2.20, netmask: 255.255.255.0, gw: 192.168.2.1
I (103) wireless: WIFI_EVENT_STA_START
I (1589) wireless: WIFI_EVENT_STA_CONNECTED, ssid: 1337_2U
I (1639) wireless: IP_EVENT_STA_GOT_IP
I (1640) wireless: itwt setup, flow_id: 1, err: 0x0
I (1710) wireless: WIFI_EVENT_ITWT_SETUP twt_id: 0, flow_id: 1, trigger-enabled, announced, wake_dura: 255, wake_dura_unit: 0, wake_invl_e: 13, wake_invl_m: 46875
I (1715) wireless: WIFI_EVENT_ITWT_SETUP target wake time: 2533095219200, wake duration: 65280us, service period: 384000000us
V (1727) app_main: Wireless outputs: exec timer: 1678ms
V (1733) app_main: ADC executing
V (1737) app_main: ADC outputs: VCC = 3.722V, read timer: 582us
V (1744) app_main: SensorShtc init executing
V (1750) app_main: SensorShtc init outputs: low-power-mode-enabled, sleep-enabled, init timer: 1126us
V (1758) app_main: SensorShtc run executing
V (1765) app_main: SensorShtc run outputs: temperature = 24.55°C, humidity = 36.54% rH, read timer: 1525us
V (1774) app_main: Mqtt executing
I (1778) wireless: itwt teardown, flow_id: 8, err: 0x0
I (1779) wireless: WIFI_EVENT_ITWT_TEARDOWN flow_id 8(all twt)
I (1791) mqtt: MQTT_EVENT_BEFORE_CONNECT
I (1870) mqtt: MQTT_EVENT_CONNECTED
I (1871) mqtt: Sent publish, msg_id=36802
I (1875) mqtt: MQTT_EVENT_PUBLISHED, msg_id=36802
I (1879) mqtt: Destroyed client
I (1880) wireless: itwt setup, flow_id: 1, err: 0x0
I (1951) wireless: WIFI_EVENT_ITWT_SETUP twt_id: 0, flow_id: 1, trigger-enabled, announced, wake_dura: 255, wake_dura_unit: 0, wake_invl_e: 13, wake_invl_m: 46875
I (2002) wireless: WIFI_EVENT_ITWT_SETUP target wake time: 2533095526400, wake duration: 65280us, service period: 384000000us
V (2007) app_main: Mqtt outputs: data sent = '1, 0,24.5,36.5,3.722,-40,-1', msg id = 36802, exec timer: 229ms
V (2018) app_main: Iteration exec time: 284ms
V (2023) app_main: Heap max alloc: 123.3086 / 382.5547Kb
V (2029) app_main: Stack max alloc: 2.5195 / 8.0000Kb
D (2034) app_main: Delay until next iteration: 299716ms
V (301756) app_main: ADC executing
V (301756) app_main: ADC outputs: VCC = 3.720V, read timer: 556us
V (301805) app_main: SensorShtc init executing
V (301807) app_main: SensorShtc init outputs: low-power-mode-enabled, sleep-enabled, init timer: 1104us
V (301809) app_main: SensorShtc run executing
V (301816) app_main: SensorShtc run outputs: temperature = 23.79°C, humidity = 36.49% rH, read timer: 1525us
V (301825) app_main: Mqtt executing
I (301829) wireless: itwt teardown, flow_id: 8, err: 0x0
I (301830) wireless: WIFI_EVENT_ITWT_TEARDOWN flow_id 8(all twt)
I (301843) mqtt: MQTT_EVENT_BEFORE_CONNECT
E (301949) esp-tls: [sock=54] select() timeout
E (301949) transport_base: Failed to open a new connection: 32774
E (301950) mqtt_client: Error transport connect
I (301955) mqtt: MQTT_EVENT_ERROR
E (302005) mqtt: Last error reported from esp-tls: 0x8006 ESP_ERR_ESP_TLS_CONNECTION_TIMEOUT
I (302005) mqtt: MQTT_EVENT_DISCONNECTED
W (302008) app_main: mqtt ERROR_BIT | DISCONNECTED_BIT set during start(), retrying...
I (302018) mqtt: Destroyed client
I (302021) mqtt: MQTT_EVENT_BEFORE_CONNECT
E (302127) esp-tls: [sock=54] select() timeout
E (302127) transport_base: Failed to open a new connection: 32774
E (302128) mqtt_client: Error transport connect
I (302133) mqtt: MQTT_EVENT_ERROR
E (302137) mqtt: Last error reported from esp-tls: 0x8006 ESP_ERR_ESP_TLS_CONNECTION_TIMEOUT
I (302146) mqtt: MQTT_EVENT_DISCONNECTED
W (302150) app_main: mqtt ERROR_BIT | DISCONNECTED_BIT set during start(), retrying...
I (302160) mqtt: Destroyed client
I (302163) mqtt: MQTT_EVENT_BEFORE_CONNECT
E (302270) esp-tls: [sock=54] select() timeout
E (302270) transport_base: Failed to open a new connection: 32774
E (302365) mqtt_client: Error transport connect
I (302365) mqtt: MQTT_EVENT_ERROR
E (302365) mqtt: Last error reported from esp-tls: 0x8006 ESP_ERR_ESP_TLS_CONNECTION_TIMEOUT
I (302372) mqtt: MQTT_EVENT_DISCONNECTED
E (302377) app_main: mqtt ERROR_BIT | DISCONNECTED_BIT set during start() after 3 attempts
I (302386) mqtt: Destroyed client
I (302390) wireless: itwt setup, flow_id: 1, err: 0x0
I (302463) wireless: WIFI_EVENT_ITWT_SETUP twt_id: 0, flow_id: 1, trigger-enabled, announced, wake_dura: 255, wake_dura_unit: 0, wake_invl_e: 13, wake_invl_m: 46875
I (302514) wireless: WIFI_EVENT_ITWT_SETUP target wake time: 2533395763200, wake duration: 65280us, service period: 384000000us
V (302519) app_main: Mqtt outputs: data sent = 'NULL', msg id = 0, exec timer: 688ms
W (302528) app_main: Iteration exec time: 771ms
V (302533) app_main: Heap max alloc: 125.2617 / 382.5547Kb
V (302539) app_main: Stack max alloc: 2.5195 / 8.0000Kb
D (302545) app_main: Delay until next iteration: 299229ms
V (601780) app_main: ADC executing
V (601780) app_main: ADC outputs: VCC = 3.720V, read timer: 556us
V (601830) app_main: SensorShtc init executing
V (601831) app_main: SensorShtc init outputs: low-power-mode-enabled, sleep-enabled, init timer: 1102us
V (601834) app_main: SensorShtc run executing
V (601841) app_main: SensorShtc run outputs: temperature = 23.53°C, humidity = 39.46% rH, read timer: 1524us
V (601850) app_main: Mqtt executing
I (601854) wireless: itwt teardown, flow_id: 8, err: 0x0
I (601855) wireless: WIFI_EVENT_ITWT_TEARDOWN flow_id 8(all twt)
I (601868) mqtt: MQTT_EVENT_BEFORE_CONNECT
E (601974) esp-tls: [sock=54] select() timeout
E (601974) transport_base: Failed to open a new connection: 32774
E (601975) mqtt_client: Error transport connect
I (601980) mqtt: MQTT_EVENT_ERROR
E (602030) mqtt: Last error reported from esp-tls: 0x8006 ESP_ERR_ESP_TLS_CONNECTION_TIMEOUT
I (602030) mqtt: MQTT_EVENT_DISCONNECTED
W (602033) app_main: mqtt ERROR_BIT | DISCONNECTED_BIT set during start(), retrying...
I (602043) mqtt: Destroyed client
I (602046) mqtt: MQTT_EVENT_BEFORE_CONNECT
I (602057) mqtt: MQTT_EVENT_CONNECTED
I (602059) mqtt: Sent publish, msg_id=49687
I (602062) mqtt: MQTT_EVENT_PUBLISHED, msg_id=49687
I (602068) mqtt: Destroyed client
I (602069) wireless: itwt setup, flow_id: 1, err: 0x0
I (602142) wireless: WIFI_EVENT_ITWT_SETUP twt_id: 0, flow_id: 1, trigger-enabled, announced, wake_dura: 255, wake_dura_unit: 0, wake_invl_e: 13, wake_invl_m: 46875
I (602193) wireless: WIFI_EVENT_ITWT_SETUP target wake time: 2533695283200, wake duration: 65280us, service period: 384000000us
V (602198) app_main: Mqtt outputs: data sent = '3,60,23.5,39.5,3.720,-40,771', msg id = 49687, exec timer: 343ms
V (602209) app_main: Iteration exec time: 428ms
V (602214) app_main: Heap max alloc: 125.5039 / 382.5547Kb
V (602221) app_main: Stack max alloc: 2.5195 / 8.0000Kb
D (602227) app_main: Delay until next iteration: 299572ms

More Information.

I've tried many things to work round the issue including turning off various power saving techniques to omit them as a possible cause:

I've also tried idling after a teardown with vTaskDelay() before creating a connecting. Finally, I've tried using _esp_wifi_sta_itwt_teardown(FLOW_IDALL) to no effect. It also doesn't matter what timeout I give (100ms or 10000ms) when creating a tcp mqtt connection, it still fails taking the full timeout duration.

For context the wifi connection seems OK and is to a Ubiquiti UniFi U6 Enterprise AP with ~-40dbm RSSI and ~20% TX retries. Finally and anecdotally, the errors "seem" to come in waves as if related to interference? I've attached my sdkconfig.zip from the application I'm experiencing the issue in case I've missed anything.

On a related note, I'm also experiencing strange behaviour and fairly high power consumption in auto light sleep during TWT, ~110ua vs ~35ua as advertised, with a period of 1s:

110ua

These don't seem to be ARP related as per: https://github.com/espressif/esp-idf/issues/12441#issuecomment-1775029218. Setting ARP_TMR_INTERVAL to any other value, currently set at 2000ms in the graph, does not change the interval shown on the graph. I've also seen strange transitions to auto light sleep after setting up a TWT that looks similar to https://github.com/espressif/esp-idf/issues/11858#issuecomment-1640244734 where the MODEM power domain is not powering down and thus current draw is around ~300ua. This can be seen earlier in the same current trace as above in the previous iteration:

300ua

Furthermore, I've tried switching dev boards vendors (both exhibit the same power draw), disconnected all peripherals and disabled/floated/hold all relevant GPIO to no avail. I've also edited sleep_modem.c with _esp_sleep_cpu_retentioninit() and get the same power draw without initialising wifi and manually invoking light sleep but strangely without the 1 second pulse in current consumption.

Hopefully someone can point me in the right direction, if I haven't provide enough detail.

djangoa commented 6 months ago

Further to my initial comments, I've caught the select() timeout issue, with more debugging enabled (CONFIG_LWIP_DEBUG, CONFIG_LWIP_ETHARP_DEBUG, CONFIG_LWIP_SOCKETS_DEBUG, CONFIG_LWIP_IP_DEBUG, CONFIG_LWIP_TCP_DEBUG), both during and after connecting to a MQTT broker:

timeout-pre-conn_debug.log timeout-post-conn_debug.log timeout_tcp_debug.log

In the last log, additionally TCP_INPUT_DEBUG, TCP_OUPUT_DEBUG is enabled and around Line 430 the device has a timeout. It looks like it's receiving broadcasts from another IP but not from the broker. So maybe if I'm interpreting the log correctly the TWT teardown is a red herring unless directly after the setting up TWT. I think I'm also suffering from https://github.com/espressif/esp-idf/issues/6607#issuecomment-799239990 where LWIP_TCP_TMR_INTERVAL, LWIP_TCP_RTO_TIME are set too high by default (750ms) for the first SYN-SENT to be retransmitted before my application invokes it's own mqtt_client event timeout. It would be good to know if there is any expectation of having a delay between tearing down a TWT and the wifi fabric becoming usable for connections?

Regarding the power issue I've adapted https://github.com/espressif/esp-idf/issues/4628#issuecomment-575535394 to suspend the lwip tcpip task before auto light sleep and wake it after. This has resolved the current spikes due to tcp_slowtmr and etharp_timer but I would have thought that auto light sleep should do this automatically unless there's some technical reason this shouldn't be done? I'm getting around 50uA, at the board level, in light sleep now:

50ua

I'm still experiencing issues though with auto light sleep being unreliable (with & without the suspending the lwip tcpip task) and sometimes consuming 16mA and other times 38mA:

djangoa commented 6 months ago

On the topic of very high power consumption in auto light sleep after TWT setup, I'm seeing this with debugging on:

D (904564) sleep: RTC_PERIPH: AUTO(OFF)
...
D (904660) sleep: RTC_PERIPH: AUTO(OFF)
D (904669) sleep: RTC_PERIPH: AUTO(OFF)
D (904678) sleep: RTC_PERIPH: AUTO(OFF)
D (904687) sleep: RTC_PERIPH: AUTO(OFF)
D (904696) sleep: RTC_PERIPH: AUTO(OFF)
...
D (955206) sleep: RTC_PERIPH: AUTO(OFF)

Seems to get stuck in a loop until waking from auto sleep and occurs intermittently.

xuxiao111 commented 6 months ago

Hi @djangoa, Thanks for reporting. For the very high power consumption, Did you just run it on itwt example without adding other code? I will try to reproduce it in my testbed. And in previous tests, this auto-light sleep was normal.

djangoa commented 6 months ago

@xuxiao111 Hi thanks for replying. No I've not yet tried isolating the issue in the itwt example. I think we really new a TWT example that's more realistic of how it might be used in an application that transfers data over wifi to catch a lot of these issues I'm having.

djangoa commented 6 months ago

I tried isolating and couldn't replicate, I carefully looked through sdkconfig and found for some reason these were enabled:

CONFIG_PM_SLP_IRAM_OPT=y CONFIG_PM_RTOS_IDLE_OPT=y

Disabling them has resolved the intermittent high current draw I was experiencin, apologies for going off on a tangent.

In the process, I adapted the original itwt example to suspend/ resume the lwip tcpip task thus reducing current consumption to that advertised for the C6. I also enhanced event handling and added a loop to test setting up and tearing down TWT agreements. Might be useful for other devs to see how the logic of TWT compares to deep sleep and could be the basis for a future update to the existing trivial example:

itwt-enchanced-example.zip

The FIXME: in my enhanced example line:396 vTaskDelay(); reproduces reliable a teardown bug, with 46875 << 13 service period, at least on/with my network/AP. With no delay between tearing down and setting up a TWT, the teardown fails with a ESP_ERR_INVALID_ARG:

E (32092) itwt: itwt teardown failed, err:0x102 ESP_ERROR_CHECK failed: esp_err_t 0x102 (ESP_ERR_INVALID_ARG) at 0x4200bbc2 file: "./main/itwt.c" line 382

either caused by the previous setup using a new flow_id or the teardown causing an internal wifi error:

E (32240) wifi:(itwt)twt id:0 is already exist, please change to another value

I assume this might be a timing issues with TWT and is probably related to my own application having intermittent connection timeouts after tearing down a TWT and trying to use wifi too quickly. I reiterate that it would be good to know if there is any expectation of having a delay between tearing down a TWT and the wifi fabric becoming usable for connections?

@xuxiao111: Hopefully that gives you more to go off in tracking things down.

xuxiao111 commented 6 months ago

@djangoa Hi, I just reproduce itwt-enchanced-example.zip in my testbed, and I found that we should call "xEventGroupClearBits(wifi_event_group, ITWT_SETUP_BIT | ITWT_TEARDOWN_BIT);" after "itwt setup and itwt teardown". After that you can remove "line:396 vTaskDelay();" image

djangoa commented 6 months ago

@xuxiao111 Sorry to waste your time, I should have been more careful :(

Could you also test this version that integrates the ping example: itwt-ping-example.zip

If you could try randomly resetting the device, in addition to with / without DISABLE_PM_WHEN_ACTIVE and see if you notice any strange behaviour like I'm seeing, that would be great:

I (113436) itwt: <WIFI_EVENT_ITWT_TEARDOWN> flow_id 1
D (113409) wifi:mms: 0->0
I (113438) wifi:Set ps type: 0, coexist: 0
I (113441) itwt: <PING_EVENT_SUCCESS> 56 bytes from 192.168.2.1 icmp_seq=1 ttl=64 time=2 ms
D (113512) wifi:mms: 0->0
I (113691) itwt: <PING_EVENT_SUCCESS> 56 bytes from 192.168.2.1 icmp_seq=2 ttl=64 time=2 ms
D (113716) wifi:mms: 0->0
I (113941) itwt: <PING_EVENT_SUCCESS> 56 bytes from 192.168.2.1 icmp_seq=3 ttl=64 time=2 ms
D (114024) wifi:mms: 0->0
I (114192) itwt: <PING_EVENT_SUCCESS> 56 bytes from 192.168.2.1 icmp_seq=4 ttl=64 time=3 ms
D (114331) wifi:mms: 0->0
I (114439) itwt: <PING_EVENT_END> 4 packets transmitted, 4 received, 0% packet loss, time 9ms
I (114439) wifi:Set ps type: 1, coexist: 0
W (114513) wifi:not hit beacon in 65535 sp(s)
D (114513) wifi:mms: 0->2
D (114514) event: running post WIFI_EVENT:28 with handler 0x4200b40a and context 0x40833ff4 on loop 0x4087e5a8
I (114514) itwt: <WIFI_EVENT_ITWT_SETUP> twt_id:0, flow_id:1, trigger-enabled, announced, wake_dura:255, wake_dura_unit:0, wake_invl_e:13, wake_invl_m:46875
I (114515) itwt: <WIFI_EVENT_ITWT_SETUP> target wake time:2771322503168, wake duration:65280 us, service period:384000000 us
D (114517) sleep: RTC_PERIPH: AUTO(OFF)
D (114638) sleep: RTC_PERIPH: AUTO(OFF)
D (115442) sleep: RTC_PERIPH: AUTO(OFF)
D (124520) wifi:mms: 2->0
D (124521) event: running post WIFI_EVENT:29 with handler 0x4200b3a4 and context 0x40834024 on loop 0x4087e5a8
I (124570) itwt: <WIFI_EVENT_ITWT_TEARDOWN> flow_id 1
I (124571) wifi:Set ps type: 0, coexist: 0
D (124605) wifi:mms: 0->0
E (124773) itwt: <PING_EVENT_TIMEOUT> From 192.168.2.1 icmp_seq=1 timeout
D (124913) wifi:mms: 0->0
E (125023) itwt: <PING_EVENT_TIMEOUT> From 192.168.2.1 icmp_seq=2 timeout
D (125117) wifi:mms: 0->0
D (125220) wifi:mms: 0->0
E (125273) itwt: <PING_EVENT_TIMEOUT> From 192.168.2.1 icmp_seq=3 timeout
E (125522) itwt: <PING_EVENT_TIMEOUT> From 192.168.2.1 icmp_seq=4 timeout
D (125527) wifi:mms: 0->0
E (125572) itwt: <PING_EVENT_END> 4 packets transmitted, 0 received, 100% packet loss, time 797mn

If not then it's probably an issue with my AP or network and we can close. Thanks again for your support and time looking into this.

Also what does the "i" stand for in "itwt"? :)

xuxiao111 commented 6 months ago

Hi @djangoa, I have checked the itwt-ping-example.zip, seems like the ping timeout time is too small, advises adding it to 1000ms, and try it again.

djangoa commented 6 months ago

@xuxiao111 Hi again, I reduced the timeouts so you don't have to sit and wait for a long time for the intermittent issue to arise. I've tried with higher values the same still happens just the that timeout lasts longer; 1 second or 10 seconds etc. it doesn't matter.

The physical and the data link layer of the Wifi doesn't seem to either retransmit packets or able to receive for long durations. My expectations are that maybe missing one or two pings is normal but to get 100% packet loss over consecutive pings is not and does not happen to other devices attached to the AP.

xuxiao111 commented 6 months ago

Hi @djangoa, Can you help capture the ping frame sent by ESP32C6 (Wireshark)? That we can know whether the AP received the ping request but did not send the ping reply, or the station could not send any frame during this period.

And the 'i' stand means individual TWT.

djangoa commented 6 months ago

Hi @xuxiao111,

Apologies for the late reply. I have ssh remote captured from the Ubiquity AP using tcpdump and the packets never arrive as shown below. My observations; resetting the device seems to cause the intermittent connectivity issue to occur more regularly often with no packets getting through. Other devices attached to the same SSID on that AP I have tested have ~<0.1% loss with 99.9% of RTTs under 50ms (this is a local LAN after all).

I (2008) esp_netif_handlers: sta ip: 192.168.2.20, mask: 255.255.255.0, gw: 192.168.2.1
D (2009) event: running post IP_EVENT:0 with handler 0x4200b6e2 and context 0x40833fc4 on loop 0x4087e5a8
W (2083) wifi:not hit beacon in 65535 sp(s)
D (2083) wifi:mms: 0->2
D (2084) wifi:mms: 2->2
D (2085) event: running post WIFI_EVENT:28 with handler 0x4200b40a and context 0x40833ff4 on loop 0x4087e5a8
I (2085) itwt: <WIFI_EVENT_ITWT_SETUP> twt_id:0, flow_id:1, trigger-enabled, announced, wake_dura:255, wake_dura_unit:0, wake_invl_e:13, wake_invl_m:46875
I (2086) itwt: <WIFI_EVENT_ITWT_SETUP> target wake time:3348043399168, wake duration:65280 us, service period:384000000 us
D (2089) wifi:mms: 2->0
D (2090) event: running post WIFI_EVENT:29 with handler 0x4200b3a4 and context 0x40834024 on loop 0x4087e5a8
I (2090) itwt: <WIFI_EVENT_ITWT_TEARDOWN> flow_id 1
I (2091) wifi:Set ps type: 0, coexist: 0

W (2108) wifi:<ba-add>idx:0, ifx:0, tid:6, TAHI:0x1000fd3, TALO:0x6d322260, (ssn:2, win:64, cur_ssn:2), CONF:0xc0006005
I (2123) itwt: <PING_EVENT_SUCCESS> 56 bytes from 192.168.2.2 icmp_seq=1 ttl=128 time=30 ms
W (2126) wifi:<ba-add>idx:1, ifx:0, tid:0, TAHI:0x1000fd3, TALO:0x6d322260, (ssn:1, win:64, cur_ssn:1), CONF:0xc0000005
D (2157) wifi:mms: 0->0
I (2368) itwt: <PING_EVENT_SUCCESS> 56 bytes from 192.168.2.2 icmp_seq=2 ttl=128 time=26 ms
D (2464) wifi:mms: 0->0
D (2567) wifi:mms: 0->0
D (2771) wifi:mms: 0->0
E (2792) itwt: <PING_EVENT_TIMEOUT> From 192.168.2.2 icmp_seq=3 timeout
D (2976) wifi:mms: 0->0
E (3042) itwt: <PING_EVENT_TIMEOUT> From 192.168.2.2 icmp_seq=4 timeout
D (3079) wifi:mms: 0->0
E (3092) itwt: <PING_EVENT_END> 4 packets transmitted, 2 received, 50% packet loss, time 454mn
I (3092) wifi:Set ps type: 1, coexist: 0
W (3165) wifi:not hit beacon in 65535 sp(s)
D (3165) wifi:mms: 0->2
D (3165) event: running post WIFI_EVENT:28 with handler 0x4200b40a and context 0x40833ff4 on loop 0x4087e5a8
I (3166) itwt: <WIFI_EVENT_ITWT_SETUP> twt_id:0, flow_id:1, trigger-enabled, announced, wake_dura:255, wake_dura_unit:0, wake_invl_e:13, wake_invl_m:46875
I (3167) itwt: <WIFI_EVENT_ITWT_SETUP> target wake time:3348044513280, wake duration:65280 us, service period:384000000 us
D (3169) sleep: RTC_PERIPH: AUTO(OFF)
D (3266) sleep: RTC_PERIPH: AUTO(OFF)
D (4095) sleep: RTC_PERIPH: AUTO(OFF)
D (13172) wifi:mms: 2->0
D (13173) event: running post WIFI_EVENT:29 with handler 0x4200b3a4 and context 0x40834024 on loop 0x4087e5a8
I (13222) itwt: <WIFI_EVENT_ITWT_TEARDOWN> flow_id 1
I (13223) wifi:Set ps type: 0, coexist: 0
D (13261) wifi:mms: 0->0
I (13272) itwt: <PING_EVENT_SUCCESS> 56 bytes from 192.168.2.2 icmp_seq=1 ttl=128 time=48 ms
D (13363) wifi:mms: 0->0
D (13670) wifi:mms: 0->0
E (13674) itwt: <PING_EVENT_TIMEOUT> From 192.168.2.2 icmp_seq=2 timeout
I (13734) itwt: <PING_EVENT_SUCCESS> 56 bytes from 192.168.2.2 icmp_seq=3 ttl=128 time=9 ms
D (13978) wifi:mms: 0->0
I (13987) itwt: <PING_EVENT_SUCCESS> 56 bytes from 192.168.2.2 icmp_seq=4 ttl=128 time=13 ms
E (14224) itwt: <PING_EVENT_END> 4 packets transmitted, 3 received, 25% packet loss, time 269mn
I (14224) wifi:Set ps type: 1, coexist: 0
W (14297) wifi:not hit beacon in 65535 sp(s)
D (14297) wifi:mms: 0->2
D (14297) wifi:mms: 2->2
D (14298) event: running post WIFI_EVENT:28 with handler 0x4200b40a and context 0x40833ff4 on loop 0x4087e5a8
I (14299) itwt: <WIFI_EVENT_ITWT_SETUP> twt_id:0, flow_id:1, trigger-enabled, announced, wake_dura:255, wake_dura_unit:0, wake_invl_e:13, wake_invl_m:46875
I (14300) itwt: <WIFI_EVENT_ITWT_SETUP> target wake time:3348055556096, wake duration:65280 us, service period:384000000 us
D (14302) sleep: RTC_PERIPH: AUTO(OFF)
D (15232) sleep: RTC_PERIPH: AUTO(OFF)

sshdump

xuxiao111 commented 6 months ago

Hi @djangoa, Thank you for your reply, but this captured wired packet cannot determine whether C6 did not send a ping request. Some APs may think that the STA is still in the TWT state and do not reply to the STA.

djangoa commented 6 months ago

I don't have the equipment to capture real Wifi frames between the device and AP as per: https://wiki.wireshark.org/CaptureSetup/WLAN:

netsh wlan show wirelesscapabilities
...
Network monitor mode                        : Not supported

Hence my confusion in sending you a wire capture from the AP but at least you can omit other network related issues causing the packet loss. Did you try to replicate with?:

Could you also test this version that integrates the ping example: itwt-ping-example.zip

Regarding:

Some APs may think that the STA is still in the TWT state and do not reply to the STA.

Which APs have been tested so it can be documented here for when other developers evaluate whether to use TWT in a future product? Is there no workaround for that before returning during tear down of the TWT agreement? As it stands esp_wifi_sta_itwt_teardown() returns instantly:

I (3152444) wireless: itwt teardown, flow_id: 8, err: 0x0
I (3152445) wireless: WIFI_EVENT_ITWT_TEARDOWN flow_id 8(all twt)

while esp_wifi_sta_itwt_setup() takes ~70ms to return.

I hope you can see that this issue dramatically impacts battery operated devices with a limited energy budget and given https://github.com/espressif/esp-idf/issues/13312#issue-2168108301 / https://github.com/espressif/esp-idf/issues/13310#issue-2167628047, I don't see any feasible way to use the current TWT implementation in that context, unless we can find a way to address this issue.

djangoa commented 6 months ago

Further to above when seeing the issue, calling esp_wifi_get_tx_statistics() reports nothing was transmitted:

I (2933) wireless: Get tx statistics
I (2934) wireless: [aci:2] tb(suc:0, ack:0, err:0), count(suc:0, ack:0, err:0, tot:0, max_sent:0)
I (2935) wireless: [aci:2] enable:0, complete:0, tb_times:0, tb_last:0, edca:0, succ:0, fail(0,0,0, cts:0/nan%, ack:0/nan%, tot:0, nan%), edca(ack:0, ba:0), tb(hw-ba:0, sw-ba:0)
I (2937) wireless: [aci:2] txFrames:0, s-mpdu:0(nan%), bitmap(max:0, min:0, tot:0, avg:nan), retry(edca:0, tb:0, nan%), collision:0, timeout:0
I (2939) wireless: [aci:2] seqno_rtt[0,0], hw_rtt[0, 0], muedca[enable:0, times:0, 0.00, 0.00, tot:0.00], avg:nan ms, tot:0.000 secs

so looks like "the station could not send any frame during this period"

xuxiao111 commented 5 months ago

Hi @djangoa, Sorry for the late response, we already do the ping example itwt-ping-example.zip in our testbed, and the AP we used is AUSU-56U. And it works fine in our testbed, you can take a look at the log files: twt_ping_example.log

Also, we call the esp_wifi_get_tx_statistics() after the ping test is complete. image Can you tell us which AP you are using? And we can use the same AP in our testbed.

djangoa commented 5 months ago

Hi @xuxiao111,

Thanks for getting back in touch. The AP is a: Ubiquiti U6-Enterprise.

I can see you're using ~500ms service period:524288 us, can you try with a longer wake interval to replicate a more sensible use case? This might trigger the issue where I see no txFrames in the statistics when my dev board resets.

I also see strange behaviour in your log file where twt setup is failing many times and causing resets, I've also experienced this issue:

E (25283) wifi:(itwt)twt id:0 is already exist, please change to another value

Most importantly I can see in your log multiple instances where PING is intermittently timing out as in my environment:

I (516434) itwt: <WIFI_EVENT_ITWT_TEARDOWN> flow_id 8(all twt)
I (516485) wifi:Set ps type: 0, coexist: 0

E (516688) itwt: <PING_EVENT_TIMEOUT> From 192.168.50.1 icmp_seq=1 timeout
E (517083) itwt: <PING_EVENT_TIMEOUT> From 192.168.50.1 icmp_seq=2 timeout
I (517216) itwt: <PING_EVENT_SUCCESS> 56 bytes from 192.168.50.1 icmp_seq=3 ttl=64 time=132 ms
I (517251) itwt: <PING_EVENT_SUCCESS> 56 bytes from 192.168.50.1 icmp_seq=4 ttl=64 time=13 ms
E (517487) itwt: <PING_EVENT_END> 4 packets transmitted, 2 received, 50% packet loss, time 690mn
W (517487) stats: Get tx statistics

(test)rx_trig:0, tx_bfrpt:2, tb_times:0, tb_qos_null:0, tb_qos_data:0, tb_cca_cancel:0, tb_sifs_abort:0, tb_pwr_outof_range:0
(test)aci:2, tb(suc:0, ack:0, err:0), count(suc:0, ack:0, err:0, tot:0, max_sent:0)
(test)aci:2, enable:4, complete:4, tb_times:0, tb_last:0, edca:4, succ:4, fail(0,0,0, cts:0/0.00%, ack:0/0.00%, tot:0, 0.00%), edca(ack:0, ba:4), tb(hw-ba:0, sw-ba:0)
(test)aci:2, txFrames:4, s-mpdu:0(0.00%), bitmap(max:1, min:1, tot:4, avg:1.00), retry(edca:0, tb:0, 0.00%), collision:0, timeout:0
(test)aci:2, seqno_rtt[321,77102], hw_rtt[321, 77102], muedca[enable:0, times:0, 0.00, 0.00, tot:0.00], avg:24.872 ms, tot:0.099 secs

In my opinion this should not be happening with your RSSI of -46 in an environment with average interference and confirms for me that you are also experiencing the same intermittent problems on your AP as I am.

xuxiao111 commented 5 months ago

Hi @djangoa,

I have changed the service period to 118489088us (~118s), and the test log is twt_ping_example_2.log , and still works fine. The "E (25283) wifi:(itwt)twt id:0 is already exist, please change to another value", is an internal error, we already found the root cause and will fix it ASAP. Thanks for reporting this issue. I (516434) itwt: flow_id 8(all twt) I (516485) wifi:Set ps type: 0, coexist: 0

E (516688) itwt: From 192.168.50.1 icmp_seq=1 timeout E (517083) itwt: From 192.168.50.1 icmp_seq=2 timeout I (517216) itwt: 56 bytes from 192.168.50.1 icmp_seq=3 ttl=64 time=132 ms I (517251) itwt: 56 bytes from 192.168.50.1 icmp_seq=4 ttl=64 time=13 ms E (517487) itwt: 4 packets transmitted, 2 received, 50% packet loss, time 690mn W (517487) stats: Get tx statistics

(test)rx_trig:0, tx_bfrpt:2, tb_times:0, tb_qos_null:0, tb_qos_data:0, tb_cca_cancel:0, tb_sifs_abort:0, tb_pwr_outof_range:0 (test)aci:2, tb(suc:0, ack:0, err:0), count(suc:0, ack:0, err:0, tot:0, max_sent:0) (test)aci:2, enable:4, complete:4, tb_times:0, tb_last:0, edca:4, succ:4, fail(0,0,0, cts:0/0.00%, ack:0/0.00%, tot:0, 0.00%), edca(ack:0, ba:4), tb(hw-ba:0, sw-ba:0) (test)aci:2, txFrames:4, s-mpdu:0(0.00%), bitmap(max:1, min:1, tot:4, avg:1.00), retry(edca:0, tb:0, 0.00%), collision:0, timeout:0 (test)aci:2, seqno_rtt[321,77102], hw_rtt[321, 77102], muedca[enable:0, times:0, 0.00, 0.00, tot:0.00], avg:24.872 ms, tot:0.099 secs As for ping timeout, from this log, we can see, that enabled to transmit 4 data frames, and the success is also 4, the success means that the station successfully transmits the data frame and receives the ACK frame from AP. So seems like the reason why ping timeout is no ping replay received within 250 ms.

Also, I have done the ping test using iperf example under the same test env, image We can see the ping delay between 4ms ~ 225ms, the main reason is that my environment is relatively noisy and there will be many wireless devices, which will interfere with each other. And I will do a long test to see if the same problem exists.

djangoa commented 5 months ago

Hi @xuxiao111,

The "E (25283) wifi:(itwt)twt id:0 is already exist, please change to another value", is an internal error, we already found the root cause and will fix it ASAP. Thanks for reporting this issue.

That's great, thank you. Look forward to the fix. The ping issue is probably a red herring / distraction, and this internal error in combination with my AP might be causing the problem when my device resets.

We can see the ping delay between 4ms ~ 225ms, the main reason is that my environment is relatively noisy and there will be many wireless devices, which will interfere with each other. And I will do a long test to see if the same problem exists.

Makes sense, thanks for taking time to test this.

djangoa commented 5 months ago

Hi @xuxiao111,

I've been doing further testing and come across another issue:

A strange edge case that occurs infrequently where calling the esp_wifi_sta_itwt_teardown(FLOW_ID_ALL) function returns no error and WIFI_EVENT_ITWT_TEARDOWN event callback is not invoked and thus my app code times out:

I (47088082) wireless: itwt teardown, flow_id: 8, err: 0x0
E (47088333) app_main: Timeout waiting for wireless::ITWT_TEARDOWN_BIT event after 250ms

Is there anything that the esp_wifi_sta_itwt_teardown(FLOW_ID_ALL) does that could cause this hang? Normally this function triggers the callback near instantly. I'm in the process of testing whether invoking esp_wifi_sta_itwt_teardown(FLOW_ID_ALL) a second time as a workaround after this timeout occurs causes further issues.

djangoa commented 5 months ago

Calling a subsequent time works around the issue but isn't ideal. Please can you consider enhancing:

/** Argument structure for WIFI_EVENT_TWT_TEARDOWN event */
typedef struct {
    uint8_t flow_id;                     /**< flow id */
} wifi_event_sta_itwt_teardown_t;

with any details of internal errors as per the wifi_event_sta_itwt_setup_t struct so future itwt developers dont trip up over the same problem?

djangoa commented 5 months ago

Unfortunately I was too quick to comment, calling esp_wifi_sta_itwt_teardown(FLOW_ID_ALL) more than once causes strange behaviour, auto light sleep no longer functions. It seems RX stops retrying (The device gets more regular socket timeouts in my application code, RX retries go from ~20% - ~0%) and the wifi rx stats change from:

[tid 0] legacy:4, su(su:18, su_txbf:11, su_stbc:7, su_retry:2/11.11%, ersu:0, ersu_dcm:0, su_noeb:0/0.00%)

to

[tid 0] legacy:669, su(su:20, su_txbf:20, su_stbc:0, su_retry:0/0.00%, ersu:0, ersu_dcm:0, su_noeb:5/25.00%)

What is su_noeb?

So I'm stuck resetting the device when esp_wifi_sta_itwt_teardown(FLOW_ID_ALL) hangs but maybe this is also related to "id:0 is already exist" issue.

djangoa commented 5 months ago

Hi @xuxiao111,

Furthermore, we're experiencing issues with several other aspects of HE wifi on the C6, these are some rx stats during longer testing:

I (20458958) wireless: [tid 0] legacy:9, su(su:8, su_txbf:5, su_stbc:3, su_retry:0/0.00%, ersu:0, ersu_dcm:0, su_noeb:0/0.00%)
...
I (20508968) wireless: [tid 0] legacy:0, su(su:36, su_txbf:0, su_stbc:36, su_retry:35/97.22%, ersu:0, ersu_dcm:0, su_noeb:0/0.00%)
...
I (22709735) wireless: [tid 0] legacy:0, su(su:36, su_txbf:0, su_stbc:36, su_retry:35/97.22%, ersu:0, ersu_dcm:0, su_noeb:0/0.00%)
...
W (24040213) wireless: No rx statistics available
I (24040214) wireless: [tid 0] legacy:1, ht(ht:0, ht_retry:0/0.00%, ht_noeb:0/0.00%)
I (24040215) wireless: [tid 0] legacy:1, su(su:0, su_txbf:0, su_stbc:0, su_retry:0/0.00%, ersu:147, ersu_dcm:147, su_noeb:0/0.00%)
I (24040216) wireless: [tid 0] legacy:1, mu(mu:0, mimo:0, non-mimo:0, txbf:0, stbc:0, mu_retry:0/0.00%, mu_noeb:0/0.00%)

They don't reflect the TX retries from the AP that hover around ~15% and as can be seen in the log above after some time the device starts to see nearly 100% retry rates and then stops reporting anything other than ersu*. This coincides with a jump in heap high water mark of 159136K vs 147316K. Additionally the AP reports the RX rate of the device as 7.3 Mbps (MCS0 with 3.2 µs GI) from which it never recovers. A reset resolves the issue and the PHY rate goes back to normal 86 Mbps.

Additionally esp_wifi_get_rx_mu_statistics(&rx_mu_stats); stats report nothing, so can't see details regarding MCS:

W (2026) wireless: No rx mimo statistics
W (2026) wireless: (mimo)dut rx:0
W (2026) wireless: (mimo)sigb, mcs0:0(nan%), mcs1:0(nan%), mcs2:0(nan%), mcs3:0(nan%), mcs4:0(nan%), mcs5:0(nan%)
W (2028) wireless: (mimo)users num2:0(nan%), num3:0(nan%), num4:0(nan%), num5:0(nan%), num6:0(nan%), num7:0(nan%), num8:0(nan%)
W (2029) wireless: No rx non mimo statistics
W (2030) wireless: (nonmimo)dut rx:0
W (2030) wireless: (nonmimo)ru_alloc_96_num_2046:0, ru_alloc_112_num_2046:0
W (2031) wireless: (nonmimo)sigb, mcs0:0(nan%), mcs1:0(nan%), mcs2:0(nan%), mcs3:0(nan%), mcs4:0(nan%), mcs5:0(nan%)
W (2033) wireless: (nonmimo)users, num1:0(nan%), num2:0(nan%), num3:0(nan%), num4:0(nan%), num5:0(nan%), num6:0(nan%), num7:0(nan%), num8:0(nan%), num9:0(nan%)

Finally, excessive (>50%) TX retry rates occur when the C6 is close to our AP, reducing the tx power down from 20db -> 16.5db mitigates.

I'm running another long duration test with TWT disabled to see if the same issues occur with the stuck PHY rate, if it does I'll create a new bug report, else it's probably best to keep all these teardown/setup issues here?

What hardware and setup are you using to capture wifi frames? Maybe running that here can facilitate analysing these problems.

xuxiao111 commented 5 months ago

Hi @xuxiao111,

I've been doing further testing and come across another issue:

A strange edge case that occurs infrequently where calling the esp_wifi_sta_itwt_teardown(FLOW_ID_ALL) function returns no error and WIFI_EVENT_ITWT_TEARDOWN event callback is not invoked and thus my app code times out:

I (47088082) wireless: itwt teardown, flow_id: 8, err: 0x0
E (47088333) app_main: Timeout waiting for wireless::ITWT_TEARDOWN_BIT event after 250ms

Is there anything that the esp_wifi_sta_itwt_teardown(FLOW_ID_ALL) does that could cause this hang? Normally this function triggers the callback near instantly. I'm in the process of testing whether invoking esp_wifi_sta_itwt_teardown(FLOW_ID_ALL) a second time as a workaround after this timeout occurs causes further issues.

Hi, @djangoa. After we call the esp_wifi_sta_itwt_teardown(FLOW_ID_ALL) functions, STA will send a TWT teardown frame to inform the AP that the STA has exited the TWT (Target Wake Time) state. The STA will post a teardown event only after successfully sending this frame. From the observation, it seems that the teardown frame failed to be sent, thus resulting in the absence of a teardown event.

xuxiao111 commented 5 months ago

Calling a subsequent time works around the issue but isn't ideal. Please can you consider enhancing:

/** Argument structure for WIFI_EVENT_TWT_TEARDOWN event */
typedef struct {
    uint8_t flow_id;                     /**< flow id */
} wifi_event_sta_itwt_teardown_t;

with any details of internal errors as per the wifi_event_sta_itwt_setup_t struct so future itwt developers dont trip up over the same problem?

yeah, I think we need to add some enhancement here, regardless of whether the TWT teardown frame is successfully sent or not, a teardown event should be posted to inform the user whether the teardown process was successful or not.

xuxiao111 commented 5 months ago

Unfortunately I was too quick to comment, calling esp_wifi_sta_itwt_teardown(FLOW_ID_ALL) more than once causes strange behaviour, auto light sleep no longer functions. It seems RX stops retrying (The device gets more regular socket timeouts in my application code, RX retries go from ~20% - ~0%) and the wifi rx stats change from:

[tid 0] legacy:4, su(su:18, su_txbf:11, su_stbc:7, su_retry:2/11.11%, ersu:0, ersu_dcm:0, su_noeb:0/0.00%)

to

[tid 0] legacy:669, su(su:20, su_txbf:20, su_stbc:0, su_retry:0/0.00%, ersu:0, ersu_dcm:0, su_noeb:5/25.00%)

What is su_noeb?

So I'm stuck resetting the device when esp_wifi_sta_itwt_teardown(FLOW_ID_ALL) hangs but maybe this is also related to "id:0 is already exist" issue.

Can you please provide more logs? Is it sufficient to trigger this behavior by simply calling esp_wifi_sta_itwt_teardown(FLOW_ID_ALL) multiple times?

xuxiao111 commented 5 months ago

Hi @xuxiao111,

Furthermore, we're experiencing issues with several other aspects of HE wifi on the C6, these are some rx stats during longer testing:

I (20458958) wireless: [tid 0] legacy:9, su(su:8, su_txbf:5, su_stbc:3, su_retry:0/0.00%, ersu:0, ersu_dcm:0, su_noeb:0/0.00%)
...
I (20508968) wireless: [tid 0] legacy:0, su(su:36, su_txbf:0, su_stbc:36, su_retry:35/97.22%, ersu:0, ersu_dcm:0, su_noeb:0/0.00%)
...
I (22709735) wireless: [tid 0] legacy:0, su(su:36, su_txbf:0, su_stbc:36, su_retry:35/97.22%, ersu:0, ersu_dcm:0, su_noeb:0/0.00%)
...
W (24040213) wireless: No rx statistics available
I (24040214) wireless: [tid 0] legacy:1, ht(ht:0, ht_retry:0/0.00%, ht_noeb:0/0.00%)
I (24040215) wireless: [tid 0] legacy:1, su(su:0, su_txbf:0, su_stbc:0, su_retry:0/0.00%, ersu:147, ersu_dcm:147, su_noeb:0/0.00%)
I (24040216) wireless: [tid 0] legacy:1, mu(mu:0, mimo:0, non-mimo:0, txbf:0, stbc:0, mu_retry:0/0.00%, mu_noeb:0/0.00%)

They don't reflect the TX retries from the AP that hover around ~15% and as can be seen in the log above after some time the device starts to see nearly 100% retry rates and then stops reporting anything other than ersu*. This coincides with a jump in heap high water mark of 159136K vs 147316K. Additionally the AP reports the RX rate of the device as 7.3 Mbps (MCS0 with 3.2 µs GI) from which it never recovers. A reset resolves the issue and the PHY rate goes back to normal 86 Mbps.

Additionally esp_wifi_get_rx_mu_statistics(&rx_mu_stats); stats report nothing, so can't see details regarding MCS:

W (2026) wireless: No rx mimo statistics
W (2026) wireless: (mimo)dut rx:0
W (2026) wireless: (mimo)sigb, mcs0:0(nan%), mcs1:0(nan%), mcs2:0(nan%), mcs3:0(nan%), mcs4:0(nan%), mcs5:0(nan%)
W (2028) wireless: (mimo)users num2:0(nan%), num3:0(nan%), num4:0(nan%), num5:0(nan%), num6:0(nan%), num7:0(nan%), num8:0(nan%)
W (2029) wireless: No rx non mimo statistics
W (2030) wireless: (nonmimo)dut rx:0
W (2030) wireless: (nonmimo)ru_alloc_96_num_2046:0, ru_alloc_112_num_2046:0
W (2031) wireless: (nonmimo)sigb, mcs0:0(nan%), mcs1:0(nan%), mcs2:0(nan%), mcs3:0(nan%), mcs4:0(nan%), mcs5:0(nan%)
W (2033) wireless: (nonmimo)users, num1:0(nan%), num2:0(nan%), num3:0(nan%), num4:0(nan%), num5:0(nan%), num6:0(nan%), num7:0(nan%), num8:0(nan%), num9:0(nan%)

Finally, excessive (>50%) TX retry rates occur when the C6 is close to our AP, reducing the tx power down from 20db -> 16.5db mitigates.

I'm running another long duration test with TWT disabled to see if the same issues occur with the stuck PHY rate, if it does I'll create a new bug report, else it's probably best to keep all these teardown/setup issues here?

What hardware and setup are you using to capture wifi frames? Maybe running that here can facilitate analysing these problems.

Furthermore, we're experiencing issues with several other aspects of HE wifi on the C6, these are some rx stats during longer testing: Hi, for esp_wifi_get_rx_mu_statistics(&rx_mu_stats), only when the STA receives an MU-PPDU, calling esp_wifi_get_rx_mu_statistics(&rx_mu_stats) will have data available. Please fell free to open a new bug to track this issues. We are using intel AX200 wifi card to capture wifi frame, you can fellow this guide: https://wifilogic.wordpress.com/2019/09/21/how-to-setup-wifi6-sniffer-in-ubuntu/

djangoa commented 5 months ago

Can you please provide more logs? Is it sufficient to trigger this behavior by simply calling esp_wifi_sta_itwt_teardown(FLOW_ID_ALL) multiple times?

It takes about a day to get the issue to trigger, I'll post the debug logs when possible.

djangoa commented 5 months ago

Here's the verbose log, not sure it's helpful other than showing the failed teardown and lack of sleeping (wifi:mms: 2->2) between wake-up iterations.

...
I (190934) wireless: Flow_id bitmap: 00000010
V (190934) esp_adapter: thread sem get: sem=0x4082b0b4
I (190935) wireless: itwt teardown, flow_id: 1, err: 0x0
D (191093) wifi:mms: 2->2
E (191186) app_main: Timeout waiting for wireless::ITWT_TEARDOWN_BIT event after 250ms, attempt 1
V (191186) esp_adapter: thread sem get: sem=0x4082b0b4
I (191186) wireless: Flow_id bitmap: 00000010
V (191187) esp_adapter: thread sem get: sem=0x4082b0b4
I (191188) wireless: itwt teardown, flow_id: 1, err: 0x0
D (191188) wifi:mms: 2->0
D (191189) event: running post WIFI_EVENT:29 with handler 0x4200eaca and context 0x4083749c on loop 0x4087e6c8
I (191190) wireless: WIFI_EVENT_ITWT_TEARDOWN flow_id 1
V (191190) esp_adapter: thread sem get: sem=0x40837dc0
I (191191) wireless: Flow_id bitmap: 00000000
V (191192) esp_adapter: thread sem get: sem=0x4082b0b4
I (191192) wifi:Set ps type: 0, coexist: 0
D (191193) tiT: sys_restart_timeouts() complete
...

...
D (191195) wifi:mms: 0->0
I (191202) wifi:Set ps type: 1, coexist: 0
V (191202) esp_adapter: thread sem get: sem=0x4082b0b4
I (191202) wifi:Set ps type: 1, coexist: 0
V (191203) app_main: Suspended LWIP TCPIP thread: 'tiT'
V (191203) esp_adapter: thread sem get: sem=0x4082b0b4
I (191204) wireless: Flow_id bitmap: 00000000
V (191205) esp_adapter: thread sem get: sem=0x4082b0b4
I (191206) wireless: itwt setup, flow_id: 1, err: 0x0
W (191278) wifi:not hit beacon in 65535 sp(s)
D (191278) wifi:mms: 0->2
W (191278) wifi:awake:1, dream:0, sleep:0
D (191279) event: running post WIFI_EVENT:28 with handler 0x4200e7d8 and context 0x4083746c on loop 0x4087e6c8
I (191279) wireless: WIFI_EVENT_ITWT_SETUP twt_id: 0, flow_id: 1, trigger-enabled, announced, wake_dura: 255, wake_dura_unit: 0, wake_invl_e: 13, wake_invl_m: 46875
I (191281) wireless: WIFI_EVENT_ITWT_SETUP target wake time: 198265864192, wake duration: 65280us, service period: 384000000us, exec timer 74.70ms
V (191283) esp_adapter: thread sem get: sem=0x40837dc0
I (191284) wireless: Flow_id bitmap: 00000010
V (191284) esp_adapter: thread sem get: sem=0x4082b0b4
W (191285) wireless: Get tx statistics
I (191286) wireless: [aci:2] enable:18, complete:18, tb_times:0, tb_last:0, edca:18, succ:14, fail(0,0,0, cts:0/0.00%, ack:4/22.22%, tot:4, 22.22%), edca(ack:4, ba:10)
I (191287) wireless: [aci:2] txFrames:14, s-mpdu:4(22.22%), bitmap(max:1, min:1, tot:10, avg:1.00), retry(edca:8, tb:0, 57.14%), collision:0, timeout:0
I (191289) wireless: [aci:2] seqno_rtt[147,2267], hw_rtt[147, 2267], avg:0.444 ms, tot:0.008 secs
V (191290) esp_adapter: thread sem get: sem=0x4082b0b4
W (191291) wireless: Get rx statistics
I (191292) wireless: [tid 0] legacy:8, su(su:14, su_txbf:11, su_stbc:3, su_retry:3/21.43%, ersu:0, ersu_dcm:0, su_noeb:0/0.00%)
V (191293) esp_adapter: thread sem get: sem=0x4082b0b4
I (191294) wireless: stats, tx_retry: 57.14/49.66/49.66%, rx_retry: 21.43/27.04/27.04%, avg_rtt: 0.444/0.403/0.403ms, samples: 10/20/20
W (191295) wireless: Clear statistics
V (191296) esp_adapter: thread sem get: sem=0x4082b0b4
V (191297) esp_adapter: thread sem get: sem=0x4082b0b4
V (191297) esp_adapter: thread sem get: sem=0x4082b0b4
D (191298) wifi:mms: 2->2
V (191299) esp_adapter: thread sem get: sem=0x4082b0b4
V (191299) esp_adapter: thread sem get: sem=0x4082b0b4
V (191300) esp_adapter: thread sem get: sem=0x4082b0b4
V (191300) app_main: Iteration exec time: 433ms
W (191301) app_main: Heap free/max/total : 283740 / 117560 / 395480B
V (191302) app_main: Stack max alloc: 5464 / 8192B
D (191303) app_main: Delay until next iteration: 9567ms
D (191502) wifi:mms: 2->2
D (191810) wifi:mms: 2->2
D (191935) temperature_sensor: range changed, change to index 3
D (192118) wifi:mms: 2->2
D (192220) wifi:mms: 2->2
D (192424) wifi:mms: 2->2
D (192731) wifi:mms: 2->2
D (192935) temperature_sensor: range changed, change to index 4
D (193038) wifi:mms: 2->2
D (193141) wifi:mms: 2->2
D (193346) wifi:mms: 2->2
D (193550) wifi:mms: 2->2
D (193935) temperature_sensor: range changed, change to index 4
D (193960) wifi:mms: 2->2
D (194063) wifi:mms: 2->2
D (194165) wifi:mms: 2->2
D (194370) wifi:mms: 2->2
D (194472) wifi:mms: 2->2
D (194935) temperature_sensor: range changed, change to index 4
D (195087) wifi:mms: 2->2
D (195394) wifi:mms: 2->2
D (195598) wifi:mms: 2->2
D (195701) wifi:mms: 2->2
D (195935) temperature_sensor: range changed, change to index 4
D (196315) wifi:mms: 2->2
D (196622) wifi:mms: 2->2
D (196930) wifi:mms: 2->2
D (196935) temperature_sensor: range changed, change to index 4
D (197237) wifi:mms: 2->2
D (197544) wifi:mms: 2->2
D (197935) temperature_sensor: range changed, change to index 4
D (198670) wifi:mms: 2->2
D (198773) wifi:mms: 2->2
D (198935) temperature_sensor: range changed, change to index 4
D (199080) wifi:mms: 2->2
D (199387) wifi:mms: 2->2
D (199694) wifi:mms: 2->2
D (199935) temperature_sensor: range changed, change to index 4
D (200002) wifi:mms: 2->2
D (200206) wifi:mms: 2->2
D (200514) wifi:mms: 2->2
D (200616) wifi:mms: 2->2
D (200821) wifi:mms: 2->2
V (200870) app_main: Next iteration starting
...