Open KonssnoK opened 1 year ago
more info> i have a setup made of 4 devices, not all of them report this warning at the same time. it happens that only one device reports it and the others no. The devices are all close to each other
some more logs
I (11:58:44.201) mesh_netif: Interface WIFI
W (565327) wifi:scan number is 0
W (11:58:44.210) mesh_hand: scan_done 0 1 0
W (11:58:44.212) mesh_main: <MESH_EVENT_TODS_REACHABLE>reachable:0
W (566546) mesh: [mesh_schedule.c,3131] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:839, xseqno:65, no_wnd_count:0, timeout_count:25
W (567747) mesh: [mesh_schedule.c,3131] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:839, xseqno:65, no_wnd_count:0, timeout_count:26
W (568948) mesh: [mesh_schedule.c,3131] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:839, xseqno:65, no_wnd_count:0, timeout_count:27
W (570149) mesh: [mesh_schedule.c,3131] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:839, xseqno:65, no_wnd_count:0, timeout_count:28
W (571350) mesh: [mesh_schedule.c,3131] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:839, xseqno:65, no_wnd_count:0, timeout_count:29
W (572551) mesh: [mesh_schedule.c,3131] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:839, xseqno:65, no_wnd_count:0, timeout_count:30
W (573752) mesh: [mesh_schedule.c,3131] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:839, xseqno:65, no_wnd_count:0, timeout_count:31
W (574953) mesh: [mesh_schedule.c,3131] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:839, xseqno:65, no_wnd_count:0, timeout_count:32
W (576154) mesh: [mesh_schedule.c,3131] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:839, xseqno:65, no_wnd_count:0, timeout_count:33
W (577355) mesh: [mesh_schedule.c,3131] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:839, xseqno:65, no_wnd_count:0, timeout_count:34
W (578556) mesh: [mesh_schedule.c,3131] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:839, xseqno:65, no_wnd_count:0, timeout_count:35
W (579757) mesh: [mesh_schedule.c,3131] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:839, xseqno:65, no_wnd_count:0, timeout_count:36
W (580958) mesh: [mesh_schedule.c,3131] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:839, xseqno:65, no_wnd_count:0, timeout_count:37
I (11:59:00.321) monitor: history_pop
W (11:59:00.321) mqtt_app: Disconnecting mqtt_app
I (11:59:00.322) mqtt_client: Client asked to disconnect
I (11:59:00.322) monitor: history_end
I (11:59:00.332) offline: object saved to PSRAM, free 2096323
W (582159) mesh: [mesh_schedule.c,3131] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:839, xseqno:65, no_wnd_count:0, timeout_count:38
W (583360) mesh: [mesh_schedule.c,3131] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:839, xseqno:65, no_wnd_count:0, timeout_count:39
W (584561) mesh: [mesh_schedule.c,3131] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:839, xseqno:65, no_wnd_count:0, timeout_count:40
W (585762) mesh: [mesh_schedule.c,3131] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:839, xseqno:65, no_wnd_count:0, timeout_count:41
W (586963) mesh: [mesh_schedule.c,3131] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:839, xseqno:65, no_wnd_count:0, timeout_count:42
W (588164) mesh: [mesh_schedule.c,3131] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:839, xseqno:65, no_wnd_count:0, timeout_count:43
W (589365) mesh: [mesh_schedule.c,3131] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:839, xseqno:65, no_wnd_count:0, timeout_count:44
W (590566) mesh: [mesh_schedule.c,3131] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:839, xseqno:65, no_wnd_count:0, timeout_count:45
W (591767) mesh: [mesh_schedule.c,3131] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:839, xseqno:65, no_wnd_count:0, timeout_count:46
W (592968) mesh: [mesh_schedule.c,3131] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:839, xseqno:65, no_wnd_count:0, timeout_count:47
W (594169) mesh: [mesh_schedule.c,3131] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:839, xseqno:65, no_wnd_count:0, timeout_count:48
W (595370) mesh: [mesh_schedule.c,3131] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:839, xseqno:65, no_wnd_count:0, timeout_count:49
W (596571) mesh: [mesh_schedule.c,3131] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:839, xseqno:65, no_wnd_count:0, timeout_count:50
W (597772) mesh: [mesh_schedule.c,3131] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:839, xseqno:65, no_wnd_count:0, timeout_count:51
W (598973) mesh: [mesh_schedule.c,3131] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:839, xseqno:65, no_wnd_count:0, timeout_count:52
W (600174) mesh: [mesh_schedule.c,3131] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:839, xseqno:65, no_wnd_count:0, timeout_count:53
W (601375) mesh: [mesh_schedule.c,3131] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:839, xseqno:65, no_wnd_count:0, timeout_count:54
W (602576) mesh: [mesh_schedule.c,3131] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:839, xseqno:65, no_wnd_count:0, timeout_count:55
W (603777) mesh: [mesh_schedule.c,3131] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:839, xseqno:65, no_wnd_count:0, timeout_count:56
W (604978) mesh: [mesh_schedule.c,3131] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:839, xseqno:65, no_wnd_count:0, timeout_count:57
W (11:59:24.014) safety: emergency message sending not yet implemented!
@KonssnoK This log indicates that the mesh node's xon window is zero, and it requests xon window from the parent, but the parent doesn't response, so the request timeout. If the xon window is zero, the node can't send data out.
@zhangyanjiaoesp uhm... i see this happen mainly when the wifi is off, i usually don't see this with wifi on unless devices are looking for the wifi.
Shouldn't this xon window independent from the external wifi?
@KonssnoK When the wifi off, the parent node can't send data out ,then it can't update the RX queue, this will affect the packet sending of its children. Please refer to this doc: https://docs.espressif.com/projects/esp-idf/en/latest/esp32/api-guides/esp-wifi-mesh.html?highlight=mesh#upstream-flow-control
@zhangyanjiaoesp consider we see this when we set the devices to mesh_idle when switching from fixed root to dynamic root. There is no parent 🤔
@zhangyanjiaoesp consider we see this when we set the devices to mesh_idle when switching from fixed root to dynamic root. There is no parent 🤔
@KonssnoK Then this log won't impact the performances.
@zhangyanjiaoesp ok, good to know, but is it expected when devices are idle (why would you do such thing if you don't have parents nor children)?
Last week we concentrated on stability for our deployed devices, we are improving but still haven't finished. We improved the code handling the mesh/lte handover, we might need to update the example shared on github. We are still investigating crashes and devices isolating themselves from the rest of the network.
This is why we were not active on github :)
@zhangyanjiaoesp ok, good to know, but is it expected when devices are idle (why would you do such thing if you don't have parents nor children)?
Maybe we can optimize it.
Maybe we can optimize it.
Ok, we are just investigating what is going, once we have a running setup also for you should be quite easy to see such behaviors
@zhangyanjiaoesp coming back to these logs.
We are currently testing our system when the WIFI is missing -> Forced fixed root shape
What i see is the following:
W (2997278) mesh: [mesh_schedule.c,3131] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:0, xseqno:2240, no_wnd_count:0, timeout_count:0
I (15:51:31.834) processor: send status... (1000ms)
W (2998479) mesh: [mesh_schedule.c,3131] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:0, xseqno:2240, no_wnd_count:0, timeout_count:1
W (15:51:32.260) mqtt_client: Written 88B in 2424ms
Whenever this happens, a packet takes more than 2 seconds to be sent out -> which i expect it to be 1200 + 1200 + the actual time it takes to send out the packet
W (2725277) mesh: [mesh_schedule.c,3131] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:0, xseqno:2030, no_wnd_count:0, timeout_count:0
W (15:46:59.834) monitor: 3W3P:: rule deactivated
W (2726478) mesh: [mesh_schedule.c,3131] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:0, xseqno:2030, no_wnd_count:0, timeout_count:1
W (2727679) mesh: [mesh_schedule.c,3131] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:0, xseqno:2030, no_wnd_count:0, timeout_count:2
E (15:47:01.444) mesh_driver: STA: Send err 0x400A ESP_ERR_MESH_TIMEOUT
W (15:47:01.451) mqtt_client: Written 88B in 3615ms
again 1200 + 1200 + 1200 + the actual time to send the data.
More logs of the same issue
W (987780) mesh: [mesh_schedule.c,3131] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:0, xseqno:520, no_wnd_count:0, timeout_count:0
W (988981) mesh: [mesh_schedule.c,3131] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:0, xseqno:520, no_wnd_count:0, timeout_count:1
W (15:18:02.759) mqtt_client: Written 492B in 2420ms
I (15:18:42.338) processor: send status... (60002ms)
I (15:19:43.336) processor: send status... (60998ms)
I (15:20:43.336) processor: send status... (60000ms)
W (1167780) mesh: [mesh_schedule.c,3131] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:0, xseqno:660, no_wnd_count:0, timeout_count:0
W (1168981) mesh: [mesh_schedule.c,3131] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:0, xseqno:660, no_wnd_count:0, timeout_count:1
W (15:21:02.782) mqtt_client: Written 492B in 2444ms
Please consider that the device is connected to the fixed parent as layer 2 and connected to MQTT,
On top, it seems that it is also cause of other errors in the wifi, like in the following log where it generates a timeout in the beacon 🤔
W (22553) mesh: [mesh_schedule.c,3131] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:0, xseqno:21, no_wnd_count:0, timeout_count:6
W (23754) mesh: [mesh_schedule.c,3131] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:0, xseqno:21, no_wnd_count:0, timeout_count:7
W (24955) mesh: [mesh_schedule.c,3131] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:0, xseqno:21, no_wnd_count:0, timeout_count:8
E (01:00:24.200) mesh_driver: STA: Send err 0x400A ESP_ERR_MESH_TIMEOUT
I (25007) wifi:ap_probe_send over, resett wifi status to disassoc
I (25008) wifi:state: run -> init (c800)
I (25009) wifi:pm stop, total sleep time: 0 us / 15467078 us
I (25012) wifi:new:<1,0>, old:<1,1>, ap:<1,1>, sta:<1,1>, prof:1
W (01:00:24.265) mesh_main: <MESH_EVENT_PARENT_DISCONNECTED>reason: 200 WIFI_REASON_BEACON_TIMEOU
As already discussed, this log is related to the parent handling the window. What i see is that when the device shows
I (16:11:49.724) processor: send offline message...
W (4227278) mesh: [mesh_schedule.c,3131] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:0, xseqno:3250, no_wnd_count:0, timeout_count:0
W (4228479) mesh: [mesh_schedule.c,3131] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:0, xseqno:3250, no_wnd_count:0, timeout_count:1
W (16:12:03.151) mqtt_client: Written 726B in 2424ms
in the root node i see
I (16:11:58.186) routing_table: Routing table update
I (4417589) mesh: 4016<scan>self-organized is disabled, users shall read out scan results by themselves.
I (16:12:02.492) mesh_main: <MESH_EVENT_SCAN_DONE>number:59
I (16:12:27.002) mesh_hand: Checking wifi connectivity (count 130)[LOCAL]
Could it be that when the root is scanning the WIFI the mesh is blocked?
Is there a way to do a iterative wifi scan?
@zhangyanjiaoesp can you please look at this? thanks
@KonssnoK
Could it be that when the root is scanning the WIFI the mesh is blocked? AP is used for communication for the children STA for searching. Apparently, while searching the AP does not respond to the layer 2, which delays the MQTT messages
yes, if the STA is scanning, it will affect the packet sending.
Inside the wifi, when the station has connected to other ap or the softAP has station connected, if the device is scanning, it will back to home channel to send data between scanning consecutive channels, the default dwell home channel time is 30ms, you can increase this value by setting this parameter https://github.com/espressif/esp-idf/blob/bb8dd9d35b4961a8a7181cd856a41e79e436081e/components/esp_wifi/include/esp_wifi_types.h#L161 However, be careful not to exceed the scanning time for each channel.
I (25007) wifi:ap_probe_send over, resett wifi status to disassoc I (25008) wifi:state: run -> init (c800) I (25009) wifi:pm stop, total sleep time: 0 us / 15467078 us I (25012) wifi:new:<1,0>, old:<1,1>, ap:<1,1>, sta:<1,1>, prof:1 W (01:00:24.265) mesh_main:
reason: 200 WIFI_REASON_BEACON_TIMEOU
This log indicates that the station has not received a beacon from the AP for 6 consecutive seconds.
@zhangyanjiaoesp what is the default scan time for a channel? Does using passive scan change anything?
These are our current settings:
wifi_scan_config_t scan_config = { 0 };
scan_config.show_hidden = 1;
scan_config.scan_type = WIFI_SCAN_TYPE_PASSIVE;
Thanks
The default active scan time for a channel is 120ms, the default passive scan time for a channel is 360ms. home_chan_dwell_time
won't be affect when using passive scan.
The default active scan time for a channel is 120ms, the default passive scan time for a channel is 360ms.
home_chan_dwell_time
won't be affect when using passive scan.
so it makes sense to use active scan 🤔 Ok we'll do some tests
Hi @KonssnoK, could you please share your updates about this issue? Thanks.
hello @Sherry616, we did move to active scanning but i did not have time to check if this is still happening!
Answers checklist.
General issue report
Hello, when we switch off the WIFI in our setups (LTE device acting as fixed ROOT), we start seeing the following warnings:
What are they related to? i expect the root node to propagate window informations to the children 🤔 Could this impact performances?