espressif / esp-idf

Espressif IoT Development Framework. Official development framework for Espressif SoCs.
Apache License 2.0
13.37k stars 7.21k forks source link

[ESP32S3] v4.4 MESH disconnecting periodically (IDFGH-10766) #11982

Open KonssnoK opened 1 year ago

KonssnoK commented 1 year ago

Answers checklist.

General issue report

Hello @zhangyanjiaoesp ,

i'm currently checking one installation, and i see that the devices are constanyl disconnecting and reconnecting to the same mesh layer.

The disconnection seems to occur at mesh level every hour. The reported disconnection cause is 102 MESH_REASON_LEAF

[27] STA: Send err 0x400A ESP_ERR_MESH_TIMEOUT

[1] <MESH_EVENT_PARENT_DISCONNECTED>reason: 102 MESH_REASON_LEAF

[3] STA: Send err 0x400B ESP_ERR_MESH_DISCONNECTED

[1] MQTT_EVENT_ERROR 32792 78 0 1 0

[1] MQTT task stopped after 1ms

[4] Triggering DYNAMIC MESH handover

[14] <MESH_EVENT_PARENT_DISCONNECTED>reason: 15 WIFI_REASON_4WAY_HAND

[13] <MESH_EVENT_PARENT_DISCONNECTED>reason: 205 WIFI_REASON_CONNECTI

[1] <MESH_EVENT_PARENT_DISCONNECTED>reason: 106 MESH_REASON_SCAN_FAI

[1] <MESH_EVENT_PARENT_CONNECTED>layer:2-->2, parent:ac:17:54:00:0a:

[1] GOT IP from mesh_sta

Do you have any idea why this error could trigger every hour?

Codebase 4fc8964ec36c44cd5b959b6b30f4ffb7ccbdcf91

zhangyanjiaoesp commented 1 year ago

@KonssnoK Have checked the code, we didn't find a mesh related behavior that triggered once an hour. Maybe it related to the application, we need more information about the disconnecting device and its parent device to debug this issue. We need to know what happens to the mesh network when the device periodically disconnecting and reconnecting.

zhangyanjiaoesp commented 1 year ago

@KonssnoK Can you check the multicast key update time on the router?Is it an hour?

KonssnoK commented 1 year ago

@zhangyanjiaoesp i think the disconnections may be related to the fact that there is no router and the installation is running on fixed root... We are investigating (see the other issue we opened) https://github.com/espressif/esp-idf/issues/12018

zhangyanjiaoesp commented 11 months ago

@KonssnoK Do you still have this problem ?

KonssnoK commented 11 months ago

i would think so, yes.

image A quick look at logs shows this error still being in the field. We haven't had time to work on mesh lately, we plan on going back on it in a few weeks.

KonssnoK commented 11 months ago

@zhangyanjiaoesp i have a quesetion because i forgot..

When sending to the external network, we use:

    esp_err_t err = esp_mesh_send(NULL, &data, MESH_DATA_TODS | MESH_DATA_NONBLOCK, NULL, 0);
    if (err != ESP_OK) {
        LOGE(TAG, "STA: Send err 0x%X %s", err, esp_err_to_name(err));
    }

Sometimes, on the field, we see STA: Send err 0x400A ESP_ERR_MESH_TIMEOUT I think it's on networks with poor connectivity.

We currently have ESP_ERROR_CHECK(esp_mesh_send_block_time(2000));

Does changing the send_block_time going to impact esp_mesh_send with MESH_DATA_NONBLOCK set?? I remember we already discussed this and the answer was "yes", but maybe i'm wrong! :)

Thanks!

zhangyanjiaoesp commented 11 months ago

@zhangyanjiaoesp i have a quesetion because i forgot..

When sending to the external network, we use:

    esp_err_t err = esp_mesh_send(NULL, &data, MESH_DATA_TODS | MESH_DATA_NONBLOCK, NULL, 0);
    if (err != ESP_OK) {
        LOGE(TAG, "STA: Send err 0x%X %s", err, esp_err_to_name(err));
    }

Sometimes, on the field, we see STA: Send err 0x400A ESP_ERR_MESH_TIMEOUT I think it's on networks with poor connectivity.

We currently have ESP_ERROR_CHECK(esp_mesh_send_block_time(2000));

Does changing the send_block_time going to impact esp_mesh_send with MESH_DATA_NONBLOCK set?? I remember we already discussed this and the answer was "yes", but maybe i'm wrong! :)

Thanks!

yes, you are right.

KonssnoK commented 11 months ago

@zhangyanjiaoesp i have a quesetion because i forgot.. When sending to the external network, we use:

    esp_err_t err = esp_mesh_send(NULL, &data, MESH_DATA_TODS | MESH_DATA_NONBLOCK, NULL, 0);
    if (err != ESP_OK) {
        LOGE(TAG, "STA: Send err 0x%X %s", err, esp_err_to_name(err));
    }

Sometimes, on the field, we see STA: Send err 0x400A ESP_ERR_MESH_TIMEOUT I think it's on networks with poor connectivity. We currently have ESP_ERROR_CHECK(esp_mesh_send_block_time(2000)); Does changing the send_block_time going to impact esp_mesh_send with MESH_DATA_NONBLOCK set?? I remember we already discussed this and the answer was "yes", but maybe i'm wrong! :) Thanks!

yes, you are right.

could you tell me a bit more? the send_block_time is used for queues before non blocking part?

for example, if i have 3 seconds of block_time but i do a send with "data_nonblock", the function can be block for up to 3 seconds?

zhangyanjiaoesp commented 11 months ago

@KonssnoK

When you call esp_now_send(), if the data.tos = MESH_TOS_P2P, then the transmition will be blocked even you set the flag with MESH_DATA_NONBLOCK. If you set the data.tos = MESH_TOS_DEF and the flag with MESH_DATA_NONBLOCK, then the transmition will be unblocked. And MESH_TOS_P2P should be used when sending packets upward.

KonssnoK commented 11 months ago

Sorry could you map your answer to these? Thanks :)

#define MESH_DATA_ENC           (0x01)  /**< data encrypted (Unimplemented) */
#define MESH_DATA_P2P           (0x02)  /**< point-to-point delivery over the mesh network */
#define MESH_DATA_FROMDS        (0x04)  /**< receive from external IP network */
#define MESH_DATA_TODS          (0x08)  /**< identify this packet is target to external IP network */
#define MESH_DATA_NONBLOCK      (0x10)  /**< esp_mesh_send() non-block */
#define MESH_DATA_DROP          (0x20)  /**< in the situation of the root having been changed, identify this packet can be dropped by new root */
#define MESH_DATA_GROUP         (0x40)  /**< identify this packet is target to a group address */
zhangyanjiaoesp commented 11 months ago

In the esp-idf/components/esp_wifi/include/esp_mesh.h,

The data.tos type is here:

/**
 * @brief For reliable transmission, mesh stack provides three type of services
 */
typedef enum {
    MESH_TOS_P2P,    /**< provide P2P (point-to-point) retransmission on mesh stack by default */
    MESH_TOS_E2E,    /**< provide E2E (end-to-end) retransmission on mesh stack (Unimplemented) */
    MESH_TOS_DEF,    /**< no retransmission on mesh stack */
} mesh_tos_t;

And the flag type is here:

/**
 * @brief Flags bitmap for esp_mesh_send() and esp_mesh_recv()
 */
#define MESH_DATA_ENC           (0x01)  /**< data encrypted (Unimplemented) */
#define MESH_DATA_P2P           (0x02)  /**< point-to-point delivery over the mesh network */
#define MESH_DATA_FROMDS        (0x04)  /**< receive from external IP network */
#define MESH_DATA_TODS          (0x08)  /**< identify this packet is target to external IP network */
#define MESH_DATA_NONBLOCK      (0x10)  /**< esp_mesh_send() non-block */
#define MESH_DATA_DROP          (0x20)  /**< in the situation of the root having been changed, identify this packet can be dropped by new root */
#define MESH_DATA_GROUP         (0x40)  /**< identify this packet is target to a group address */
KonssnoK commented 9 months ago

i will create an additional issue for the 2 previous comments.

EDIT: Done, the new issue is https://github.com/espressif/esp-idf/issues/12836 , comments removed.

zhangyanjiaoesp commented 5 months ago

@KonssnoK can this issue be closed?

KonssnoK commented 4 months ago

hello @zhangyanjiaoesp, i still see the disconnection related to reason: 102 MESH_REASON_LEAF

more or less 500/hour over thousands of devices image

and i think this happens on networks that do not have WIFI available or the WIFI password is wrong, meaning they work in LTE mode,