espressif / esp-mesh-lite

A lite version Wi-Fi Mesh, each node can access the network over the IP layer.
110 stars 15 forks source link

A network failure occurs when multiple nodes are used (AEGHB-596) #72

Open yel-best opened 3 months ago

yel-best commented 3 months ago

When I was doing the actual deployment, if one of my nodes went offline, the following log would initially appear. I used 15 esp32S3 for testing, distributed in different places, and the initial one would be elected. A root node. When I let the root node go offline, some nodes can elect a new root node. However, a few two or three nodes will have the following logs, so the grid networking will never be carried out, resulting in the inability to It works normally, but it can be solved after restarting the device. I feel this is not reasonable. Please help me find out what the problem is?

W (409599) [vendor_ie]: Scanning in progress, please try again later 
W (411879) wifi:Haven't to connect to a suitable AP now!
yel-best commented 3 months ago

image I would like to know what effect or impact these options have?

*[] Join Mesh without configured with information**

*[] Join Mesh no matter whether the node is connected to router**

I am using a plan with a router, do I need to check these two items? Can it bring better stability? There are about 100 devices. I don't actively specify the root node. I hope the device can automatically elect the root node. Is it right for me after checking Join Mesh no matter whether the node is connected to router helpful?

yel-best commented 3 months ago

I am currently trying to power on 15 esp32S3s synchronously. I am using a router solution. Normally, a root node should be elected after synchronously powering on. However, when I power on synchronously, 8 esp32S3s are all root. What is the reason? ?

yel-best commented 3 months ago

https://components.espressif.com/components/espressif/mesh_lite

It seems that version 0.10.2 has been released in our github, but we only see version 0.10.1 in the idf warehouse. When will we update to the latest version? I need this "signal strength threshold" adjustment function, it will be of great help to me

image

yel-best commented 3 months ago

In addition, I would like to ask, if I want to create two mutually isolated mesh networks in the same environment, how should I configure them? All esp32S3 use the same router, but the SSID I created is different, but I found that they all seem to be able to connect because the password is 123456

yel-best commented 3 months ago

Hi

I found a problem here. When I have multiple devices in the network and I power off an intermediate node, the downstream node connected to the node will trigger multiple times: WIFI_EVENT_STA_DISCONNECTED, and then an esp-tls message will appear. error and causes the device to restart. I want to know what is the reason for this? During the test, I only selected two of the devices for testing, so there should be many mesh wifi connections in the entire network structure, but I did not connect but directly restarted. The following is the Log I saw. , please help me, thanks

I (77126) app_bee: bee mqtt msg published.
I (86396) wifi:bcn_timeout,ap_probe_send_start
W (86396) app_lvgl: lvgl_init_event_handler
I (86817) app_bee_common: System information, channel: 1, layer: 3, self mac: 68:b6:b3:4c:f2:9c, parent bssid: 48:27:e2:c8:80:d5, parent rssi: -45, free heap: 479848
I (86822) app_bee_common: {"id":"68B6B34CF29C","time":1711417509,"type":"esp-mesh-lite","rssi":"-45","status":"System information, channel: 1, layer: 3, self mac: 68:b6:b3:4c:f2:9c, parent bssid: 48:27:e2:c8:80:d5, parent rssi: -45, free heap: 479848"}
W (86844) app_bee_mqtt: Publishing to /esquel/udqmawbtts/5468ad55-dc4a-4f5a-a386-568e59e12e2b
I (88897) wifi:ap_probe_send over, resett wifi status to disassoc
I (88897) wifi:state: run -> init (c800)
I (88897) wifi:pm stop, total sleep time: 0 us / 83866503 us

I (88901) wifi:new:<1,0>, old:<1,1>, ap:<1,1>, sta:<1,1>, prof:1
W (88907) app_lvgl: lvgl_init_event_handler
W (88911) app_lvgl: WIFI_EVENT_STA_DISCONNECTED
E (88918) transport_base: poll_read select error 113, errno = Software caused connection abort, fd = 58
E (88927) mqtt_client: Poll read error: 119, aborting connection
W (88934) app_mqtt: MQTT Disconnected. Will try reconnecting in a while...
I (88941) app_bee: bee mqtt disconnected.
I (91605) [vendor_ie]: esp_mesh_lite_wifi_connect return ESP_OK 
I (91606) [ESP_Mesh_Lite_Comm]: Retry to connect to the AP
W (94433) app_lvgl: lvgl_init_event_handler
W (94433) app_lvgl: WIFI_EVENT_STA_DISCONNECTED
I (96605) [vendor_ie]: esp_mesh_lite_wifi_connect return ESP_OK 
I (96605) [ESP_Mesh_Lite_Comm]: Retry to connect to the AP
W (99433) app_lvgl: lvgl_init_event_handler
W (99433) app_lvgl: WIFI_EVENT_STA_DISCONNECTED
I (101605) [vendor_ie]: esp_mesh_lite_wifi_connect return ESP_OK 
I (101606) [ESP_Mesh_Lite_Comm]: Retry to connect to the AP
W (104433) app_lvgl: lvgl_init_event_handler
W (104433) app_lvgl: WIFI_EVENT_STA_DISCONNECTED
I (106605) [vendor_ie]: esp_mesh_lite_wifi_connect return ESP_OK 
I (106605) [ESP_Mesh_Lite_Comm]: Retry to connect to the AP
E (108948) esp-tls: [sock=58] select() timeout
E (108948) transport_base: Failed to open a new connection: 32774
E (108949) mqtt_client: Error transport connect
E (108954) app_mqtt: MQTT_EVENT_ERROR
E (108958) app_mqtt: Last error reported from esp-tls: 0x8006
E (108965) app_mqtt: Last errno string (Success)
W (108970) app_mqtt: MQTT Disconnected. Will try reconnecting in a while...
I (108978) app_bee: bee mqtt disconnected.

assert failed: tlsf_free tlsf.c:1120 (!block_is_free(block) && "block already marked as free")

Backtrace: 0x40379aba:0x3fcad0b0 0x40387789:0x3fcad0d0 0x4038e87d:0x3fcad0f0 0x4038ce11:0x3fcad210 0x4038cce6:0x3fcad230 0x4037a425:0x3fcad250 0x4038a131:0x3fcad270 0x4038a5b4:0x3fcad290
0x40379aba: panic_abort at E:/MyProject/IoT/esp/esp-idf/components/esp_system/panic.c:461

0x40387789: esp_system_abort at E:/MyProject/IoT/esp/esp-idf/components/esp_system/port/esp_system_chip.c:83

0x4038e87d: __assert_func at E:/MyProject/IoT/esp/esp-idf/components/newlib/assert.c:46 (discriminator 4)

0x4038ce11: tlsf_block_size at E:/MyProject/IoT/esp/esp-idf/components/heap/tlsf/tlsf.c:777

0x4038cce6: multi_heap_free_impl at E:/MyProject/IoT/esp/esp-idf/components/heap/multi_heap.c:222

0x4037a425: find_containing_heap at E:/MyProject/IoT/esp/esp-idf/components/heap/heap_caps.c:365
 (inlined by) heap_caps_free at E:/MyProject/IoT/esp/esp-idf/components/heap/heap_caps.c:386

0x4038a131: _xt_coproc_restorecs at E:/MyProject/IoT/esp/esp-idf/components/freertos/FreeRTOS-Kernel/portable/xtensa/xtensa_context.S:600

0x4038a5b4: prvProcessReceivedCommands at E:/MyProject/IoT/esp/esp-idf/components/freertos/FreeRTOS-Kernel/timers.c:888
 (inlined by) prvTimerTask at E:/MyProject/IoT/esp/esp-idf/components/freertos/FreeRTOS-Kernel/timers.c:622
tswen commented 3 months ago
  • use idf 5.1.3
  • use mesh-lite 0.10.1

When I was doing the actual deployment, if one of my nodes went offline, the following log would initially appear. I used 15 esp32S3 for testing, distributed in different places, and the initial one would be elected. A root node. When I let the root node go offline, some nodes can elect a new root node. However, a few two or three nodes will have the following logs, so the grid networking will never be carried out, resulting in the inability to It works normally, but it can be solved after restarting the device. I feel this is not reasonable. Please help me find out what the problem is?

W (409599) [vendor_ie]: Scanning in progress, please try again later 
W (411879) wifi:Haven't to connect to a suitable AP now!

Please kindly update to the latest commit on GitHub for testing. The latest version includes a fix for this issue.

yel-best commented 3 months ago
  • use idf 5.1.3
  • use mesh-lite 0.10.1

When I was doing the actual deployment, if one of my nodes went offline, the following log would initially appear. I used 15 esp32S3 for testing, distributed in different places, and the initial one would be elected. A root node. When I let the root node go offline, some nodes can elect a new root node. However, a few two or three nodes will have the following logs, so the grid networking will never be carried out, resulting in the inability to It works normally, but it can be solved after restarting the device. I feel this is not reasonable. Please help me find out what the problem is?

W (409599) [vendor_ie]: Scanning in progress, please try again later 
W (411879) wifi:Haven't to connect to a suitable AP now!

Please kindly update to the latest commit on GitHub for testing. The latest version includes a fix for this issue.

When is it expected to be updated here? https://components.espressif.com/components/espressif/mesh_lite

tswen commented 3 months ago

Join Mesh no matter whether the node is connected to router

"Join Mesh no matter whether the node is connected to router" - This option is not required to be selected.

"Join Mesh without configured with information" - If this option is not selected, the device needs to be configured with network information before it can join the mesh network. If selected, the device can join the network even without network configuration.

tswen commented 3 months ago

I am currently trying to power on 15 esp32S3s synchronously. I am using a router solution. Normally, a root node should be elected after synchronously powering on. However, when I power on synchronously, 8 esp32S3s are all root. What is the reason? ?

https://github.com/espressif/esp-mesh-lite/blob/master/components/mesh_lite/User_Guide.md#automatic-root-node-selection

tswen commented 3 months ago

另外我想问一下,如果我想在同一个环境中创建两个相互隔离的mesh网络,应该如何配置呢? 所有esp32S3都使用同一个路由器,但我创建的SSID不同,但我发现它们似乎都能连接,因为密码是123456

You need to set different mesh IDs for different networking configurations.

tswen commented 3 months ago

你好

我在这里发现了一个问题。当我网络中有多个设备,并且我将中间节点断电时,连接到该节点的下游节点会多次触发:WIFI_EVENT_STA_DISCONNECTED,然后会出现 esp-tls 消息。错误并导致设备重新启动。我想知道这是什么原因?测试时我只选择了其中两台设备进行测试,所以整个网络结构中应该有很多mesh wifi连接,但我没有连接而是直接重启。以下是我看到的Log。 ,请帮帮我,谢谢

I (77126) app_bee: bee mqtt msg published.
I (86396) wifi:bcn_timeout,ap_probe_send_start
W (86396) app_lvgl: lvgl_init_event_handler
I (86817) app_bee_common: System information, channel: 1, layer: 3, self mac: 68:b6:b3:4c:f2:9c, parent bssid: 48:27:e2:c8:80:d5, parent rssi: -45, free heap: 479848
I (86822) app_bee_common: {"id":"68B6B34CF29C","time":1711417509,"type":"esp-mesh-lite","rssi":"-45","status":"System information, channel: 1, layer: 3, self mac: 68:b6:b3:4c:f2:9c, parent bssid: 48:27:e2:c8:80:d5, parent rssi: -45, free heap: 479848"}
W (86844) app_bee_mqtt: Publishing to /esquel/udqmawbtts/5468ad55-dc4a-4f5a-a386-568e59e12e2b
I (88897) wifi:ap_probe_send over, resett wifi status to disassoc
I (88897) wifi:state: run -> init (c800)
I (88897) wifi:pm stop, total sleep time: 0 us / 83866503 us

I (88901) wifi:new:<1,0>, old:<1,1>, ap:<1,1>, sta:<1,1>, prof:1
W (88907) app_lvgl: lvgl_init_event_handler
W (88911) app_lvgl: WIFI_EVENT_STA_DISCONNECTED
E (88918) transport_base: poll_read select error 113, errno = Software caused connection abort, fd = 58
E (88927) mqtt_client: Poll read error: 119, aborting connection
W (88934) app_mqtt: MQTT Disconnected. Will try reconnecting in a while...
I (88941) app_bee: bee mqtt disconnected.
I (91605) [vendor_ie]: esp_mesh_lite_wifi_connect return ESP_OK 
I (91606) [ESP_Mesh_Lite_Comm]: Retry to connect to the AP
W (94433) app_lvgl: lvgl_init_event_handler
W (94433) app_lvgl: WIFI_EVENT_STA_DISCONNECTED
I (96605) [vendor_ie]: esp_mesh_lite_wifi_connect return ESP_OK 
I (96605) [ESP_Mesh_Lite_Comm]: Retry to connect to the AP
W (99433) app_lvgl: lvgl_init_event_handler
W (99433) app_lvgl: WIFI_EVENT_STA_DISCONNECTED
I (101605) [vendor_ie]: esp_mesh_lite_wifi_connect return ESP_OK 
I (101606) [ESP_Mesh_Lite_Comm]: Retry to connect to the AP
W (104433) app_lvgl: lvgl_init_event_handler
W (104433) app_lvgl: WIFI_EVENT_STA_DISCONNECTED
I (106605) [vendor_ie]: esp_mesh_lite_wifi_connect return ESP_OK 
I (106605) [ESP_Mesh_Lite_Comm]: Retry to connect to the AP
E (108948) esp-tls: [sock=58] select() timeout
E (108948) transport_base: Failed to open a new connection: 32774
E (108949) mqtt_client: Error transport connect
E (108954) app_mqtt: MQTT_EVENT_ERROR
E (108958) app_mqtt: Last error reported from esp-tls: 0x8006
E (108965) app_mqtt: Last errno string (Success)
W (108970) app_mqtt: MQTT Disconnected. Will try reconnecting in a while...
I (108978) app_bee: bee mqtt disconnected.

assert failed: tlsf_free tlsf.c:1120 (!block_is_free(block) && "block already marked as free")

Backtrace: 0x40379aba:0x3fcad0b0 0x40387789:0x3fcad0d0 0x4038e87d:0x3fcad0f0 0x4038ce11:0x3fcad210 0x4038cce6:0x3fcad230 0x4037a425:0x3fcad250 0x4038a131:0x3fcad270 0x4038a5b4:0x3fcad290
0x40379aba: panic_abort at E:/MyProject/IoT/esp/esp-idf/components/esp_system/panic.c:461

0x40387789: esp_system_abort at E:/MyProject/IoT/esp/esp-idf/components/esp_system/port/esp_system_chip.c:83

0x4038e87d: __assert_func at E:/MyProject/IoT/esp/esp-idf/components/newlib/assert.c:46 (discriminator 4)

0x4038ce11: tlsf_block_size at E:/MyProject/IoT/esp/esp-idf/components/heap/tlsf/tlsf.c:777

0x4038cce6: multi_heap_free_impl at E:/MyProject/IoT/esp/esp-idf/components/heap/multi_heap.c:222

0x4037a425: find_containing_heap at E:/MyProject/IoT/esp/esp-idf/components/heap/heap_caps.c:365
 (inlined by) heap_caps_free at E:/MyProject/IoT/esp/esp-idf/components/heap/heap_caps.c:386

0x4038a131: _xt_coproc_restorecs at E:/MyProject/IoT/esp/esp-idf/components/freertos/FreeRTOS-Kernel/portable/xtensa/xtensa_context.S:600

0x4038a5b4: prvProcessReceivedCommands at E:/MyProject/IoT/esp/esp-idf/components/freertos/FreeRTOS-Kernel/timers.c:888
 (inlined by) prvTimerTask at E:/MyProject/IoT/esp/esp-idf/components/freertos/FreeRTOS-Kernel/timers.c:622

你好

我在这里发现了一个问题。当我网络中有多个设备,并且我将中间节点断电时,连接到该节点的下游节点会多次触发:WIFI_EVENT_STA_DISCONNECTED,然后会出现 esp-tls 消息。错误并导致设备重新启动。我想知道这是什么原因?测试时我只选择了其中两台设备进行测试,所以整个网络结构中应该有很多mesh wifi连接,但我没有连接而是直接重启。以下是我看到的Log。 ,请帮帮我,谢谢

I (77126) app_bee: bee mqtt msg published.
I (86396) wifi:bcn_timeout,ap_probe_send_start
W (86396) app_lvgl: lvgl_init_event_handler
I (86817) app_bee_common: System information, channel: 1, layer: 3, self mac: 68:b6:b3:4c:f2:9c, parent bssid: 48:27:e2:c8:80:d5, parent rssi: -45, free heap: 479848
I (86822) app_bee_common: {"id":"68B6B34CF29C","time":1711417509,"type":"esp-mesh-lite","rssi":"-45","status":"System information, channel: 1, layer: 3, self mac: 68:b6:b3:4c:f2:9c, parent bssid: 48:27:e2:c8:80:d5, parent rssi: -45, free heap: 479848"}
W (86844) app_bee_mqtt: Publishing to /esquel/udqmawbtts/5468ad55-dc4a-4f5a-a386-568e59e12e2b
I (88897) wifi:ap_probe_send over, resett wifi status to disassoc
I (88897) wifi:state: run -> init (c800)
I (88897) wifi:pm stop, total sleep time: 0 us / 83866503 us

I (88901) wifi:new:<1,0>, old:<1,1>, ap:<1,1>, sta:<1,1>, prof:1
W (88907) app_lvgl: lvgl_init_event_handler
W (88911) app_lvgl: WIFI_EVENT_STA_DISCONNECTED
E (88918) transport_base: poll_read select error 113, errno = Software caused connection abort, fd = 58
E (88927) mqtt_client: Poll read error: 119, aborting connection
W (88934) app_mqtt: MQTT Disconnected. Will try reconnecting in a while...
I (88941) app_bee: bee mqtt disconnected.
I (91605) [vendor_ie]: esp_mesh_lite_wifi_connect return ESP_OK 
I (91606) [ESP_Mesh_Lite_Comm]: Retry to connect to the AP
W (94433) app_lvgl: lvgl_init_event_handler
W (94433) app_lvgl: WIFI_EVENT_STA_DISCONNECTED
I (96605) [vendor_ie]: esp_mesh_lite_wifi_connect return ESP_OK 
I (96605) [ESP_Mesh_Lite_Comm]: Retry to connect to the AP
W (99433) app_lvgl: lvgl_init_event_handler
W (99433) app_lvgl: WIFI_EVENT_STA_DISCONNECTED
I (101605) [vendor_ie]: esp_mesh_lite_wifi_connect return ESP_OK 
I (101606) [ESP_Mesh_Lite_Comm]: Retry to connect to the AP
W (104433) app_lvgl: lvgl_init_event_handler
W (104433) app_lvgl: WIFI_EVENT_STA_DISCONNECTED
I (106605) [vendor_ie]: esp_mesh_lite_wifi_connect return ESP_OK 
I (106605) [ESP_Mesh_Lite_Comm]: Retry to connect to the AP
E (108948) esp-tls: [sock=58] select() timeout
E (108948) transport_base: Failed to open a new connection: 32774
E (108949) mqtt_client: Error transport connect
E (108954) app_mqtt: MQTT_EVENT_ERROR
E (108958) app_mqtt: Last error reported from esp-tls: 0x8006
E (108965) app_mqtt: Last errno string (Success)
W (108970) app_mqtt: MQTT Disconnected. Will try reconnecting in a while...
I (108978) app_bee: bee mqtt disconnected.

assert failed: tlsf_free tlsf.c:1120 (!block_is_free(block) && "block already marked as free")

Backtrace: 0x40379aba:0x3fcad0b0 0x40387789:0x3fcad0d0 0x4038e87d:0x3fcad0f0 0x4038ce11:0x3fcad210 0x4038cce6:0x3fcad230 0x4037a425:0x3fcad250 0x4038a131:0x3fcad270 0x4038a5b4:0x3fcad290
0x40379aba: panic_abort at E:/MyProject/IoT/esp/esp-idf/components/esp_system/panic.c:461

0x40387789: esp_system_abort at E:/MyProject/IoT/esp/esp-idf/components/esp_system/port/esp_system_chip.c:83

0x4038e87d: __assert_func at E:/MyProject/IoT/esp/esp-idf/components/newlib/assert.c:46 (discriminator 4)

0x4038ce11: tlsf_block_size at E:/MyProject/IoT/esp/esp-idf/components/heap/tlsf/tlsf.c:777

0x4038cce6: multi_heap_free_impl at E:/MyProject/IoT/esp/esp-idf/components/heap/multi_heap.c:222

0x4037a425: find_containing_heap at E:/MyProject/IoT/esp/esp-idf/components/heap/heap_caps.c:365
 (inlined by) heap_caps_free at E:/MyProject/IoT/esp/esp-idf/components/heap/heap_caps.c:386

0x4038a131: _xt_coproc_restorecs at E:/MyProject/IoT/esp/esp-idf/components/freertos/FreeRTOS-Kernel/portable/xtensa/xtensa_context.S:600

0x4038a5b4: prvProcessReceivedCommands at E:/MyProject/IoT/esp/esp-idf/components/freertos/FreeRTOS-Kernel/timers.c:888
 (inlined by) prvTimerTask at E:/MyProject/IoT/esp/esp-idf/components/freertos/FreeRTOS-Kernel/timers.c:622

From the perspective of mesh logic, everything appears normal. After disconnecting from the parent node, it will attempt to reconnect several times before seeking a new parent node. However, during this period, your application layer code experienced a crash. The backtrace indicates that you may have encountered an error caused by releasing a null pointer. Please review your application layer code again.

tswen commented 3 months ago

https://components.espressif.com/components/espressif/mesh_lite

好像version 0.10.2已经在我们的github里发布了,但是我们只version 0.10.1在idf仓库里看到。我们什么时候更新到最新版本?我需要这个“信号强度阈值”调节功能,它对我有很大帮助

图像

You can first use the components in the master branch of the GitHub repository. We will soon update the new version to the package manager. Thank you for your support.