Open BartoszKubiak opened 3 years ago
Further investigation shows, that it looks like network disconnect is not always detected on node side. I've made many test powering-off root and observe what happen with mesh network. In most cases nodes generate MDF_EVENT_MWIFI_PARENT_DISCONNECTED event and stop retransmitting packet, but sometimes this not happen. Generally I observe three cases:
1) dozen warnings mesh: [mesh_schedule.c,3130] [WND-RX]max_wnd + disconnect
2) infinite warning mesh: [mesh_schedule.c,3130] [WND-RX]max_wnd
3) lots of [mwifi, 887]:
My problem is that I use mwifi_write() in my main application task - it hangs my whole application. I've added task watchdog as temporary workaround.
Fixed-root network only layer2 nodes will detect disconnection when root disappeared. At this time esp_mesh_send() will return ESP_ERR_MESH_DISCONNECTED. If esp_mesh_send() block is in the lower-level nodes for a long time, you can call esp_mesh_send_block_time() before esp_mesh_start() to solve the problem.
ESP32-WROOM-32D mdf> version I (2854964) [mdebug_cmd, 53]: ESP-IDF version : v4.3.1-dirty I (2854965) [mdebug_cmd, 54]: ESP-MDF version : v1.0-48-gdf0a825 I (2854976) [mdebug_cmd, 55]: compile time : Nov 17 2021 11:56:09 I (2854977) [mdebug_cmd, 56]: free heap : 73168 Bytes I (2854987) [mdebug_cmd, 57]: CPU cores : 2 I (2854988) [mdebug_cmd, 58]: silicon revision : 1 I (2855000) [mdebug_cmd, 64]: feature : /802.11bgn/BLE/BT/External-Flash:4 MB
mesh topology: fixed root (routerless) + 4 nodes, each node transmit short data to root every 10 seconds, root broadcast time every 30 seconds steps to reproduce: power-on everything -> wait mesh to build -> power-off root
I've observed that sometimes nodes goes in infinite transmit (retry?) loop when recipient is unreachable - I think that node starts sending frame before disconnect event: W (335821) mesh: [mesh_schedule.c,3130] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:0, xseqno:3, no_wnd_count:0, timeout_count:0 W (337023) mesh: [mesh_schedule.c,3130] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:0, xseqno:3, no_wnd_count:0, timeout_count:1 W (338225) mesh: [mesh_schedule.c,3130] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:0, xseqno:3, no_wnd_count:0, timeout_count:2 and so on increasing timeout_count number, meanwhile node can detect mesh disconnect and re-connect, they received broadcast messages from root but transmission is still blocked (maybe because retransmit_enable = y). This symptom spreads to child nodes and even to root when he tries to read data directly from infected node.
In most cases node correctly detect root disconnection: W (1381445) mesh: [mesh_schedule.c,3130] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:0, xseqno:27, no_wnd_count:0, timeout_count:0 W (1382647) mesh: [mesh_schedule.c,3130] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:0, xseqno:27, no_wnd_count:0, timeout_count:1 W (1383849) mesh: [mesh_schedule.c,3130] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:0, xseqno:27, no_wnd_count:0, timeout_count:2 W (1385051) mesh: [mesh_schedule.c,3130] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:0, xseqno:27, no_wnd_count:0, timeout_count:3 W (1386253) mesh: [mesh_schedule.c,3130] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:0, xseqno:27, no_wnd_count:0, timeout_count:4 I (1386750) [mwifi, 188]: Parent is disconnected, reason: 200 I (1386751) [MAIN, 1073]: event_loop_cb, event: 0x8 I (1386753) [MAIN, 1031]: Parent is disconnected = WIFI_REASON_BEACON_TIMEOUT W (1387455) mesh: [mesh_schedule.c,3130] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:0, xseqno:27, no_wnd_count:0, timeout_count:5 W (1387456) [mwifi, 707]: Node failed to send packets, dest_addr: ff:00:00:01:00:00, flag: 0x28, opt->type: 0x08, opt->len: 13, data->tos: 0, data: 0x3ffd8830, size: 49
W (1387478) [mwifi, 960]: Node failed to send packets, data_flag: 0x28, dest_mac: ff:00:00:01:00:00
W (1387539) [APP, 280]: [[[[[[ MESH DISCONNECTED ]]]]]]
I (1387960) [mwifi, 188]: Parent is disconnected, reason: 2
I (1387961) [MAIN, 1073]: event_loop_cb, event: 0x8
I (1387962) [MAIN, 962]: Parent is disconnected = WIFI_REASON_AUTH_EXPIRE
I (1389168) [mwifi, 188]: Parent is disconnected, reason: 2
I (1389169) [MAIN, 1073]: event_loop_cb, event: 0x8