espressif / esp-idf

Espressif IoT Development Framework. Official development framework for Espressif SoCs.
Apache License 2.0
12.89k stars 7.08k forks source link

[ESP32S3] v4.4 WIFI losing connectivity temporarily or permanently without apparent reason (IDFGH-12977) #13212

Open KonssnoK opened 4 months ago

KonssnoK commented 4 months ago

Answers checklist.

General issue report

@zhangyanjiaoesp

based on 27ec26d2d3f44bbde5da14c7fdfc82226d567874 To reproduce

    esp_ping_config_t ping_config = ESP_PING_DEFAULT_CONFIG();
    IP4_ADDR(&ping_config.target_addr, 8,8,8,8);          // target IP address
    ping_config.count = ESP_PING_COUNT_INFINITE;    // ping in infinite mode, esp_ping_stop can stop it
    ping_config.timeout_ms = 3000;

    /* set callback functions */
    esp_ping_callbacks_t cbs;
    cbs.on_ping_success = test_on_ping_success;
    cbs.on_ping_timeout = test_on_ping_timeout;
    cbs.on_ping_end = test_on_ping_end;
    cbs.cb_args = "foo";  // arguments that will feed to all callback functions, can be NULL

    esp_ping_handle_t ping;
    esp_ping_new_session(&ping_config, &cbs, &ping);
    esp_ping_start(ping);

#define DUMP_INTERVAL_MS (1000*60*2)
    int64_t last_dump = 0;
    while (true) {
        if((os_time_get_timestamp() - last_dump) >= DUMP_INTERVAL_MS) {
            last_dump = os_time_get_timestamp();
            esp_wifi_statis_dump(0xFFFF);
        }
        os_thread_sleep(1000);
    }

static void test_on_ping_success(esp_ping_handle_t hdl, void *args)
{
    // optionally, get callback arguments
    // const char* str = (const char*) args;
    // printf("%s\r\n", str); // "foo"
    uint8_t ttl;
    uint16_t seqno;
    uint32_t elapsed_time, recv_len;
    ip_addr_t target_addr;
    esp_ping_get_profile(hdl, ESP_PING_PROF_SEQNO, &seqno, sizeof(seqno));
    esp_ping_get_profile(hdl, ESP_PING_PROF_TTL, &ttl, sizeof(ttl));
    esp_ping_get_profile(hdl, ESP_PING_PROF_IPADDR, &target_addr, sizeof(target_addr));
    esp_ping_get_profile(hdl, ESP_PING_PROF_SIZE, &recv_len, sizeof(recv_len));
    esp_ping_get_profile(hdl, ESP_PING_PROF_TIMEGAP, &elapsed_time, sizeof(elapsed_time));
    ESP_LOGW(TAG, "%d bytes from %s icmp_seq=%d ttl=%d time=%d ms",
           recv_len, ipaddr_ntoa(&target_addr), seqno, ttl, elapsed_time);
}

static void test_on_ping_timeout(esp_ping_handle_t hdl, void *args)
{
    uint16_t seqno;
    ip_addr_t target_addr;
    esp_ping_get_profile(hdl, ESP_PING_PROF_SEQNO, &seqno, sizeof(seqno));
    esp_ping_get_profile(hdl, ESP_PING_PROF_IPADDR, &target_addr, sizeof(target_addr));
    ESP_LOGW(TAG, "From %s icmp_seq=%d timeout", ipaddr_ntoa(&target_addr), seqno);
}

static void test_on_ping_end(esp_ping_handle_t hdl, void *args)
{
    uint32_t transmitted;
    uint32_t received;
    uint32_t total_time_ms;

    esp_ping_get_profile(hdl, ESP_PING_PROF_REQUEST, &transmitted, sizeof(transmitted));
    esp_ping_get_profile(hdl, ESP_PING_PROF_REPLY, &received, sizeof(received));
    esp_ping_get_profile(hdl, ESP_PING_PROF_DURATION, &total_time_ms, sizeof(total_time_ms));
    ESP_LOGW(TAG, "%d packets transmitted, %d received, time %dms", transmitted, received, total_time_ms);
}

espressif_wifi_dump_3.txt espressif_wifi_dump.txt espressif_wifi_dump_2.txt

Other related issues with similar behavior: https://github.com/espressif/esp-idf/issues/8953 https://github.com/espressif/esp-idf/issues/10506

zhangyanjiaoesp commented 6 days ago

@KonssnoK Please use this wifi lib to see whether the 2/205 error loop can be solved, thanks. wifi_lib_s3_0627.zip (wifi firmware version: ace69bc)

KonssnoK commented 6 days ago

@zhangyanjiaoesp i confirm i'm currently not able to reproduce the 2/205 infinite loop on the pixel phone :) Did you find another error ?

I will continue to try to reproduce

KonssnoK commented 6 days ago

@zhangyanjiaoesp currently the only bug I am able to reproduce is a child device becoming stuck on W (155190) mesh: [mesh_schedule.c,3131] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:0, xseqno:1, no_wnd_count:0, timeout_count:123

On the root node i cannot trigger much with the new library

240627dev2_3.txt 240627dev1_3.txt 240627dev3_3.txt

for reference, it seems that it's not that easy to trigger this error on the child as it was before the 0625 library, so it could be that the library fix of 0625 is not complete

zhangyanjiaoesp commented 5 days ago

@zhangyanjiaoesp i confirm i'm currently not able to reproduce the 2/205 infinite loop on the pixel phone :) Did you find another error ?

yes, I change the wifi code. For some routers, if the STA is powered off and restarted, the router still records the connection status of the STA, so when the STA is connected again to send the auth request, the router will not reply the auth response. In this case, we first send a deauth packet before the STA sends the auth request, so that the router can clear the previous connection state, and then the STA can send the auth request to the normal connection.

zhangyanjiaoesp commented 5 days ago

@zhangyanjiaoesp currently the only bug I am able to reproduce is a child device becoming stuck on W (155190) mesh: [mesh_schedule.c,3131] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:0, xseqno:1, no_wnd_count:0, timeout_count:123

On the root node i cannot trigger much with the new library

  • dev3 root
  • dev1 stuck

240627dev2_3.txt 240627dev1_3.txt 240627dev3_3.txt

for reference, it seems that it's not that easy to trigger this error on the child as it was before the 0625 library, so it could be that the library fix of 0625 is not complete

OK, I will follow up to solve this problem.

KonssnoK commented 5 days ago

ok thanks, meanwhile i will continue some testing to see if something else appears.

KonssnoK commented 5 days ago

apparently i cannot continue on the other issues until this is fixed.

@zhangyanjiaoesp do you have any way to check when this issue was introduced?

I would like to understand if this was always there (and means slow rate is a trigger we didn't check) or if instead it was introduced by a commit, and we could rollback devices in the field

zhangyanjiaoesp commented 2 days ago

@KonssnoK I have added some debug log for the stuck issue, please use the new wifi lib to test, thanks. wifi_lib_s3_debug_0701.zip wifi firmware version: fbf47d3

KonssnoK commented 2 days ago

@zhangyanjiaoesp dev1 is stuck here without printing anything but >>>mesh_nwk_task_main,5572, nwk.state = 21

240701dev1_1.txt 240701dev2_1.txt 240701dev3_1.txt

more logs of dev1 once stuck

I (157009) wifi:station: 7c:df:a1:e2:7b:84 leave, AID = 1, bss_flags is 658531, bss:0x3fcb8f24
I (157009) wifi:new:<11,0>, old:<11,2>, ap:<11,2>, sta:<11,2>, prof:11
I (186309) mesh: >>>8415[nwk]pending[0], state:MESH_NWK_TIMER
>>>mesh_nwk_task_main,3768, nwk.state = 21
>>>mesh_nwk_task_main,5572, nwk.state = 21
I (216309) mesh: >>>8415[nwk]pending[0], state:MESH_NWK_TIMER
>>>mesh_nwk_task_main,3768, nwk.state = 21
>>>mesh_nwk_task_main,5572, nwk.state = 21
I (246309) mesh: >>>8415[nwk]pending[0], state:MESH_NWK_TIMER
>>>mesh_nwk_task_main,3768, nwk.state = 21
>>>mesh_nwk_task_main,5572, nwk.state = 21
I (276309) mesh: >>>8415[nwk]pending[0], state:MESH_NWK_TIMER
>>>mesh_nwk_task_main,3768, nwk.state = 21
>>>mesh_nwk_task_main,5572, nwk.state = 21
KonssnoK commented 2 days ago

one thing i don't understand is that when i press reset on dev2 or dev3 (with dev1 root), also the other 2 devices stop sending data completely for some time. Is this expected? It seems like the timeout of the device i'm resetting is blocking the parent, which for 3 devices isn't much, but if i think at 6 devices per node, means a huge lag for no reason...

zhangyanjiaoesp commented 2 days ago

@zhangyanjiaoesp dev1 is stuck here without printing anything but >>>mesh_nwk_task_main,5572, nwk.state = 21

240701dev1_1.txt 240701dev2_1.txt 240701dev3_1.txt

more logs of dev1 once stuck

I (157009) wifi:station: 7c:df:a1:e2:7b:84 leave, AID = 1, bss_flags is 658531, bss:0x3fcb8f24
I (157009) wifi:new:<11,0>, old:<11,2>, ap:<11,2>, sta:<11,2>, prof:11
I (186309) mesh: >>>8415[nwk]pending[0], state:MESH_NWK_TIMER
>>>mesh_nwk_task_main,3768, nwk.state = 21
>>>mesh_nwk_task_main,5572, nwk.state = 21
I (216309) mesh: >>>8415[nwk]pending[0], state:MESH_NWK_TIMER
>>>mesh_nwk_task_main,3768, nwk.state = 21
>>>mesh_nwk_task_main,5572, nwk.state = 21
I (246309) mesh: >>>8415[nwk]pending[0], state:MESH_NWK_TIMER
>>>mesh_nwk_task_main,3768, nwk.state = 21
>>>mesh_nwk_task_main,5572, nwk.state = 21
I (276309) mesh: >>>8415[nwk]pending[0], state:MESH_NWK_TIMER
>>>mesh_nwk_task_main,3768, nwk.state = 21
>>>mesh_nwk_task_main,5572, nwk.state = 21

You didn't reproduce the stuck issue on W (155190) mesh: [mesh_schedule.c,3131] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:0, xseqno:1, no_wnd_count:0, but stuck on a new case, right? It seems that the wifi task stop running.

for reference, it seems that it's not that easy to trigger this error on the child as it was before the 0625 library, so it could be that the library fix of 0625 is not complete

Can you confirm that this problem would not occurred without the change in 0625 library?

KonssnoK commented 2 days ago

Since we have more logs in this lib I tried it also with our code that uses both fixed root and dynamic one. There seem to be serious issues. 240701dev1_3_alt.txt 240701dev2_3_alt.txt

yes, i did not reproduce the W (155190) mesh: [mesh_schedule.c,3131] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:0, xseqno:1, no_wnd_count:0 yet, i'll resume it now

no, also before 0625 i had devices getting stuck without printing anything. The way to recover was rebooting them, but since there was no direct way to trigger it, i didn't write about it

KonssnoK commented 2 days ago

@zhangyanjiaoesp here is the infinite loop W (155190) mesh: [mesh_schedule.c,3131] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:0, xseqno:1, no_wnd_count:0 with library 0701

dev1 stuck dev2 stuck dev3 root

240701dev1_4.txt 240701dev3_4.txt 240701dev2_4.txt

>>>esp_mesh_push_to_tx_queue,741
>>>mesh_tx_task_main,252>>esp_mesh_push_to_tx_queue,745
W (211131) mesh: [mesh_schedule.c,3146] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:0, xseqno:56, no_wnd_count:0, timeout_count:111
>>>mesh_xon_process_disconnected,2649, (1, 1, 0)
>>>mesh_xon_task_main,3084
>>>esp_mesh_send_xon,1034
>>>esp_mesh_send_xon,1228
>>>esp_mesh_push_to_tx_queue,695, tid = 6, tx_task_id = 0
>>>esp_mesh_push_to_tx_queue,730
>>>esp_mesh_push_to_tx_queue,741
esp_mesh_push_to_tx_queue,745
I (211611) mesh: >>>8415[nwk]pending[0], state:MESH_NWK_TIMER
>>>mesh_nwk_task_main,3768, nwk.state = 21
>>>esp_mesh_forward_packet,241
I (211621) mesh: >>>[TO]dst:00:00:00:00:00:00, parent:7c:df:a1:e1:6a:d9

>>>esp_mesh_process_ucast,1102
>>>esp_mesh_push_to_tx_queue,695, tid = 5, tx_task_id = 0
>>>esp_mesh_push_to_tx_queue,733
>>>esp_mesh_push_to_tx_queue,741
>>>esp_mts_tpss_mti_,2_que
e,>4me
h_>xeta_kwm_ta,k_m4i
,5572, nwk.state = 21
W (212351) mesh: [mesh_schedule.c,3146] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:0, xseqno:56, no_wnd_count:0, timeout_count:112
>>>mesh_xon_process_disconnected,2649, (1, 1, 0)
>>>mesh_xon_task_main,3084
>>>esp_mesh_send_xon,1034
>>>esp_mesh_send_xon,1228
>>>esp_mesh_push_to_tx_queue,695, tid = 6, tx_task_id = 0
>>>esp_mesh_push_to_tx_queue,730
>>>esp_mesh_push_to_tx_queue,741
esp_mesh_push_to_tx_queue,745
W (213571) mesh: [mesh_schedule.c,3146] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:0, xseqno:56, no_wnd_count:0, timeout_count:113
>>>mesh_xon_process_disconnected,2649, (1, 1, 0)
>>>mesh_xon_task_main,3084
>>>esp_mesh_send_xon,1034
>>>esp_mesh_send_xon,1228
>>>esp_mesh_push_to_tx_queue,695, tid = 6, tx_task_id = 0
>>>esp_mesh_push_to_tx_queue,730
>>>esp_mesh_push_to_tx_queue,741
>>>esp_mesh_push_to_tx_q>e>me7h5tx_task_main,2524
I (214641) mesh: >>>8415[nwk]pending[0], state:MESH_NWK_TIMER
>>>mesh_nwk_task_main,3768, nwk.state = 21
>>>esp_mesh_forward_packet,241
I (214641) mesh: >>>[TO]dst:00:00:00:00:00:00, parent:7c:df:a1:e1:6a:d9

>>>esp_mesh_process_ucast,1102
>>>esp_mesh_push_to_tx_queue,695, tid = 5, tx_task_id = 0
>>>esp_mesh_push_to_tx_queue,733
>>>esp_mesh_push_to_tx_queue,741
>>>>se_hesx_pask_ta_nx2q4eu
,5572, nwk.state = 21k_ma4
W (214791) mesh: [mesh_schedule.c,3146] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:0, xseqno:56, no_wnd_count:0, timeout_count:114
>>>mesh_xon_process_disconnected,2649, (1, 1, 0)
>>>mesh_xon_task_main,3084
>>>esp_mesh_send_xon,1034
>>>esp_mesh_send_xon,1228
>>>esp_mesh_push_to_tx_queue,695, tid = 6, tx_task_id = 0
>>>esp_mesh_push_to_tx_queue,730
>>>esp_mesh_push_to_tx_queue,741
>>>mesh_tx_task_main,252>>esp_mesh_push_to_tx_queue,745
W (216011) mesh: [mesh_schedule.c,3146] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:0, xseqno:56, no_wnd_count:0, timeout_count:115
>>>mesh_xon_process_disconnected,2649, (1, 1, 0)
>>>mesh_xon_task_main,3084
>>>esp_mesh_send_xon,1034
>>>esp_mesh_send_xon,1228
>>>esp_mesh_push_to_tx_queue,695, tid = 6, tx_task_id = 0
>>>esp_mesh_push_to_tx_queue,730
>>>esp_mesh_push_to_tx_queue,741
esp_mesh_push_to_tx_queue,745
W (217231) mesh: [mesh_schedule.c,3146] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:0, xseqno:56, no_wnd_count:0, timeout_count:116
>>>mesh_xon_process_disconnected,2649, (1, 1, 0)
>>>mesh_xon_task_main,3084
>>>esp_mesh_send_xon,1034
>>>esp_mesh_send_xon,1228
>>>esp_mesh_push_to_tx_queue,695, tid = 6, tx_task_id = 0
>>>esp_mesh_push_to_tx_queue,730
>>>esp_mesh_push_to_tx_queue,741
esp_mesh_push_to_tx_queue,745
I (217661) mesh: >>>8415[nwk]pending[0], state:MESH_NWK_TIMER
>>>mesh_nwk_task_main,3768, nwk.state = 21
>>>esp_mesh_forward_packet,241
I (217661) mesh: >>>[TO]dst:00:00:00:00:00:00, parent:7c:df:a1:e1:6a:d9

>>>esp_mesh_process_ucast,1102
>>>esp_mesh_push_to_tx_queue,695, tid = 5, tx_task_id = 0
>>>esp_mesh_push_to_tx_queue,733
>>>esp_mesh_push_to_tx_queue,741
,5572, nwk.state = 21x234e
W (218451) mesh: [mesh_schedule.c,3146] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:0, xseqno:56, no_wnd_count:0, timeout_count:117
>>>mesh_xon_process_disconnected,2649, (1, 1, 0)
>>>mesh_xon_task_main,3084
>>>esp_mesh_send_xon,1034
>>>esp_mesh_send_xon,1228
>>>esp_mesh_push_to_tx_queue,695, tid = 6, tx_task_id = 0
>>>esp_mesh_push_to_tx_queue,730
>>>esp_mesh_push_to_tx_queue,741
>>>mesh_tx_task_main,252>>esp_mesh_push_to_tx_queue,745
W (219671) mesh: [mesh_schedule.c,3146] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:0, xseqno:56, no_wnd_count:0, timeout_count:118
>>>mesh_xon_process_disconnected,2649, (1, 1, 0)
>>>mesh_xon_task_main,3084
>>>esp_mesh_send_xon,1034
>>>esp_mesh_send_xon,1228
>>>esp_mesh_push_to_tx_queue,695, tid = 6, tx_task_id = 0
>>>esp_mesh_push_to_tx_queue,730
>>>esp_mesh_push_to_tx_queue,741
>>>mesh_tx_task_main,25>>
sp_mesh_push_to_tx_queue,745
I (220691) mesh: >>>8415[nwk]pending[0], state:MESH_NWK_TIMER
>>>mesh_nwk_task_main,3768, nwk.state = 21
>>>esp_mesh_forward_packet,241
I (220691) mesh: >>>[TO]dst:00:00:00:00:00:00, parent:7c:df:a1:e1:6a:d9
zhangyanjiaoesp commented 2 days ago

@KonssnoK Can you capture the wifi packets via the Wireshark?

zhangyanjiaoesp commented 2 days ago

@KonssnoK add more debug log in this wifi lib: wifi_lib_s3_0701_2.zip wifi firmware version: a93bcab

KonssnoK commented 2 days ago

@zhangyanjiaoesp extension .pcapng test1.txt

this is while dev1 dev2 are stuck

KonssnoK commented 2 days ago

@zhangyanjiaoesp once again both dev1 and dev2 are stuck with new library 0701 02

W (459370) mesh: [mesh_schedule.c,3146] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:0, xseqno:163, no_wnd_count:0, timeout_count:228

240701dev1_5.txt 240701dev3_5.txt 240701dev2_5.txt

Wireshark file .pcapng test2.txt


>>>esp_mesh_process_ucast,1102
>>>esp_mesh_push_to_tx_queue,695, tid = 5, tx_task_id = 0
es>_tx_h_swk_maik_2a2n,5572, nwk.state = 21
W (453270) mesh: [mesh_schedule.c,3146] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:0, xseqno:163, no_wnd_count:0, timeout_count:223
>>>mesh_xon_process_disconnected,2649, (1, 1, 0)
>>>mesh_xon_task_main,3084
>>>esp_mesh_send_xon,1034
>>>esp_mesh_send_xon,1228
>>>esp_mesh_push_to_tx_queue,695, tid = 6, tx_task_id = 0
>>>mesh_tx_task_main,2524
>>>esp_mesh_push_to_tx_queue,745
W (454490) mesh: [mesh_schedule.c,3146] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:0, xseqno:163, no_wnd_count:0, timeout_count:224
>>>mesh_xon_process_disconnected,2649, (1, 1, 0)
>>>mesh_xon_task_main,3084
>>>esp_mesh_send_xon,1034
>>>esp_mesh_send_xon,1228
>>>esp_mesh_push_to_tx_queue,695, tid = 6, tx_task_id = 0
>>>mesh_tx_task_main,2524
>>>esp_mesh_push_to_tx_queue,745
W (455710) mesh: [mesh_schedule.c,3146] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:0, xseqno:163, no_wnd_count:0, timeout_count:225
>>>mesh_xon_process_disconnected,2649, (1, 1, 0)
>>>mesh_xon_task_main,3084
>>>esp_mesh_send_xon,1034
>>>esp_mesh_send_xon,1228
>>>esp_mesh_push_to_tx_queue,695, tid = 6, tx_task_id = 0
>>>mesh_tx_task_main,2524
>>>esp_mesh_push_to_tx_queue,745
I (456130) mesh: >>>8415[nwk]pending[0], state:MESH_NWK_TIMER
>>>mesh_nwk_task_main,3768, nwk.state = 21
>>>esp_mesh_forward_packet,241
I (456130) mesh: >>>[TO]dst:00:00:00:00:00:00, parent:7c:df:a1:e1:6a:d9
zhangyanjiaoesp commented 1 day ago

@KonssnoK Can you print the tasks information when the stuck issue happen? you can refer the following code. To include the core ID info, you should enable the FREERTOS_VTASKLIST_INCLUDE_COREID option in menuconfig.

/* 'tasks' command prints the list of tasks and related information /

if WITH_TASKS_INFO

static int tasks_info(int argc, char *argv) { const size_t bytes_per_task = 40; / see vTaskList description / char task_list_buffer = malloc(uxTaskGetNumberOfTasks() * bytes_per_task); if (task_list_buffer == NULL) { ESP_LOGE(TAG, "failed to allocate buffer for vTaskList output"); return 1; }

fputs("Task Name\tStatus\tPrio\tHWM\tTask#", stdout);

ifdef CONFIG_FREERTOS_VTASKLIST_INCLUDE_COREID

fputs("\tAffinity", stdout);

endif

fputs("\n", stdout);
vTaskList(task_list_buffer);
fputs(task_list_buffer, stdout);
free(task_list_buffer);
return 0;

}

static void register_tasks(void) { const esp_console_cmd_t cmd = { .command = "tasks", .help = "Get information about running tasks", .hint = NULL, .func = &tasks_info, }; ESP_ERROR_CHECK( esp_console_cmd_register(&cmd) ); }

endif // WITH_TASKS_INFO

https://github.com/espressif/esp-idf/blob/a9d0f22193acdf47a5a4db36832ae7068818962b/examples/system/console/advanced/components/cmd_system/cmd_system.c#L159

Wireshark file .pcapng test2.txt

By the way, the capture is on channel 10, but the mesh network is on channel 1,( (10669) mesh: [DONE]connect to router:KI, channel:1, rssi:-37, d2:0f:7d:8a:8b:29[layer:0, assoc:0],) so there's no valid information in the capture.

KonssnoK commented 1 day ago

i will print it periodically because i have no idea on how to know when the issue occurs.

Yeah, I will redo the capturing on the right channel.

KonssnoK commented 1 day ago

Here are the logs with task status

240702dev1_1.txt 240702dev2_1.txt 240702dev3_1.txt

Currently I can-t use Wireshark dev1 stuck dev2 stuck


W (121404) ping: Task Name      Status  Prio    HWM     Task#   Affinity
W (121414) ping: main                   X       1       1292    4       0
IDLE1           R       0       788     6       1
IDLE0           R       0       784     5       0
MTXON           B       13      1072    13      -1
netif rx task   B       5       916     17      -1
check button ta B       5       956     19      -1
ipc0            B       24      832     1       0
MRX             B       13      1972    14      -1
ipc1            B       24      836     2       1
sys_evt         B       20      4208    9       0
tiT             B       18      376     8       -1
ping            B       2       564     16      -1
mqtt_task       B       5       3840    20      -1
mqtt task       B       5       1088    18      -1
MNWK            B       15      968     15      1
MTX             B       13      1340    12      -1
esp_timer       S       22      4216    3       0
wifi            B       23      3852    10      0
Tmr Svc         B       1       1400    7       0
MTXBLK          B       7       1312    11      -1
zhangyanjiaoesp commented 1 day ago

@KonssnoK please use the new wifi lib to test, and add the patch. wifi_lib_s3_0702.zip wifi firmware version: 43cf051 0001-add-debug-log-in-event-loop.zip

KonssnoK commented 1 day ago

dev2 stuck with lib 0702

240702dev3_2.txt 240702dev2_2.txt 240702dev1_2.txt


W (274340) ping: From 8.8.8.8 icmp_seq=68 timeout
W (275030) mesh: [mesh_schedule.c,3132] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:0, xseqno:253, no_wnd_count:0, timeout_count:0
>>>mesh_xon_process_disconnected,2638, (1, 1, 0)
I (275240) mesh_main: Tried to publish layer:2 IP:10.0.0.3
W (276230) mesh: [mesh_schedule.c,3132] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:0, xseqno:253, no_wnd_count:0, timeout_count:1
>>>mesh_xon_process_disconnected,2638, (1, 1, 0)
W (277430) mesh: [mesh_schedule.c,3132] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:0, xseqno:253, no_wnd_count:0, timeout_count:2
>>>mesh_xon_process_disconnected,2638, (1, 1, 0)
I (277680) mesh: 5162<active>parent layer:1(node), channel:4, rssi:-3, assoc:0(cnx rssi threshold:-120)my_assoc:0
W (278630) mesh: [mesh_schedule.c,3132] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:0, xseqno:253, no_wnd_count:0, timeout_count:3
>>>mesh_xon_process_disconnected,2638, (1, 1, 0)
>>>wifi_event_post,2241
I (279180) wifi:state: run -> init (6c0)
>>>ieee80211_sta_new_state,253
>>>ieee80211_sta_new_state,371
>>>cnx_sta_leave,2006
I (279190) wifi:pm stop, total sleep time: 0 us / 8461895 us

I (279190) wifi:<ba-del>idx
I (279190) wifi:new:<4,0>, old:<4,1>, ap:<4,1>, sta:<4,1>, prof:4
>>>ieee80211_sta_new_state,559, post disconnect evt
>>>wifi_event_post,2241
W (279830) mesh: [mesh_schedule.c,3132] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:0, xseqno:253, no_wnd_count:0, timeout_count:4
>>>mesh_xon_process_disconnected,2638, (1, 1, 0)
zhangyanjiaoesp commented 17 hours ago

@KonssnoK please use this new patch to test, thanks. 0002-add-debug-log-in-event-loop.zip

KonssnoK commented 15 hours ago

i flashed the 3 devices aand started them, they are currently in a strange state where nobody can connect..

240703dev3_1.txt 240703dev1_1.txt 240703dev2_1.txt

dev2 reports


d,2638, (1, 1, 0)
I (156290) mesh: 1316[recv]invalid child 7c:df:a1:e1:6a:d8
I (156290) mesh: 1316[recv]invalid child 7c:df:a1:e1:6a:d8
I (156580) mesh: 1316[recv]invalid child 7c:df:a1:e1:6a:d8
I (156580) mesh: 1316[recv]invalid child 7c:df:a1:e1:6a:d8
I (156580) mesh: 1813[XON]async, from child 7c:df:a1:e1:6a:d8
W (157070) mesh: [mesh_schedule.c

dev3 is in an infinite loop


>>>mesh_xon_process_disconnected,2638, (1, 1, 0)
>>>mesh_xon_process_disconnected,2638, (1, 1, 0)
>>>mesh_xon_process_disconnected,2638, (1, 1, 0)

dev1 fails to send any data


E (192294) wifi_netif: TXW
W (192894) ping: From 8.8.8.8 icmp_seq=48 timeout
E (193894) ping_sock: send error=0

I try to recover them..

i think you added a log somewhere where not supposed to, maybe an ISR?

KonssnoK commented 14 hours ago

@zhangyanjiaoesp I'm not sure if the issue triggered with this code, i do have an infinite loop of

W (100970) mesh: [mesh_schedule.c,3132] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:0, xseqno:3, no_wnd_count:43, timeout_count:0

but the counter resets, and no device is able to connect ever, limiter or no limiter

240703dev2_2.txt 240703dev3_2.txt 240703dev1_2.txt

KonssnoK commented 14 hours ago

i'm trying to remove the patches and go back tot a functional environment, something is off

KonssnoK commented 14 hours ago

ok i had to reboot my phone somehow to recover connectivity.

@zhangyanjiaoesp here are the logs of a stuck device

240703dev3_3.txt

KonssnoK commented 14 hours ago

more logs, dev1 without eventloop patch (i'm recompiling for this device)

dev2 in deadlock (you can see multiple task prints without any log between) dev3 in infinite >>>mesh_xon_process_disconnected,2638, (1, 1, 0) loop

240703dev3_4.txt 240703dev2_4.txt 240703dev1_4.txt

dev1 ok and now in same FW as the other 2 dev2 stuck again on the timoeut loop dev3 ok

240703dev2_5.txt 240703dev1_5.txt 240703dev3_5.txt

KonssnoK commented 13 hours ago

i see that the eventloop is not running anymore and therefore not handling events, interesting..

KonssnoK commented 12 hours ago

more logs

dev1 stuck on >>>mesh_xon_process_disconnected,2638, (1, 1, 0) dev2 in deadlock dev3 ok

240703dev1_6.txt 240703dev3_6.txt 240703dev2_6.txt

zhangyanjiaoesp commented 12 hours ago

ok i had to reboot my phone somehow to recover connectivity.

@zhangyanjiaoesp here are the logs of a stuck device

240703dev3_3.txt

@KonssnoK In this log, it seems that the esp_event_loop_run() didn't return when it handles the id=7 mesh event (MESH_EVENT_PARENT_CONNECTED). Are you using the mesh example or your own code? If it's your own code, please check why the mesh_event_handler() didn't return when handling the MESH_EVENT_PARENT_CONNECTED event. image

KonssnoK commented 12 hours ago

@zhangyanjiaoesp i'm using the espressif code with the patches i posted above, nothing else

KonssnoK commented 11 hours ago

ip_internal_3.patch ip_internal_2.patch ip_internal_1.patch ip_internal_4.patch

here are the changes to the original example (plus the library 0702)

KonssnoK commented 10 hours ago

i added a couple of prints following your example, the system gets block between these lines:

        ESP_LOGI(TAG, ">>>%s,%d\n",__func__,__LINE__);
        netif_sta = create_mesh_link_sta();
        // now we create a mesh driver and attach it to the existing netif
        mesh_netif_driver_t driver = mesh_create_if_driver(false, false);
        if (driver == NULL) {
            ESP_LOGE(TAG, "Failed to create wifi interface handle");
            return ESP_FAIL;
        }
        esp_netif_attach(netif_sta, driver);
        start_mesh_link_sta();
        // If we have a AP on NODE -> stop and remove it!
        if (netif_ap) {
            esp_netif_action_disconnected(netif_ap, NULL, 0, NULL);
            mesh_delete_if_driver(esp_netif_get_io_driver(netif_ap));
            esp_netif_destroy(netif_ap);
            netif_ap = NULL;
        }
        ESP_LOGI(TAG, ">>>%s,%d\n",__func__,__LINE__);

240703dev2_7.txt 240703dev1_7.txt 240703dev3_7.txt

KonssnoK commented 10 hours ago

added more logs, the block is coming from

esp_netif_action_connected(netif_sta, NULL, 0, NULL);

which i see calls esp_event_send_internal which calls esp_event_post_to with IP_EVENT if the DHCP is not started which i don-t see in the log.

We do have dhcp_discover(): dhcp state is DISCOVER tho, which means we reach esp_netif_dhcpc_start(esp_netif); which calls dhcp_start(struct netif *netif) which calls dhcp_discover

240703dev3_8.txt

KonssnoK commented 9 hours ago

more digging, adding more logs

dhcp_discover(): dhcp state is DISCOVER
dhcp_discover: making request
dhcp_discover: sendto(DISCOVER, IP_ADDR_BROADCAST, LWIP_IANA_PORT_DHCP_SERVER)
I (230951) esp_netif_lwip: SENDW mesh_link_sta  350 FFFFFFFFFFFF7CDFA1E27B84080045000150011200004011788C00000000FFFFFFFF00440043013C283A010106004F43B9EF00000000000000000000000000000000000000007CDFA1E27B840000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000063825363350101390205DC0C096573707265737369663D07017CDFA1E27B84370401031C06FF00000000000000000000000000000000000000000000000000000000000000000000

the block is actually on

esp_err_t esp_netif_transmit_wrap(esp_netif_t *esp_netif, void *data, size_t len, void *pbuf)
{
    ...
    ESP_LOGI(TAG,"SENDW %10s %4d %s", esp_netif->if_desc, len, buf);
    ...
    return (esp_netif->driver_transmit_wrap)(esp_netif->driver_handle, data, len, pbuf);
}

which calls esp_err_t err = esp_mesh_send(NULL, &data, MESH_DATA_TODS, NULL, 0);

which is never returning

240703dev2_9.txt

KonssnoK commented 9 hours ago

@zhangyanjiaoesp we are back to the Mesh library as source of blocking

zhangyanjiaoesp commented 9 hours ago

@KonssnoK Does calling the esp_mesh_send_block_time()API solve the problem?

KonssnoK commented 9 hours ago

@zhangyanjiaoesp not sure how this would help..

does esp_mesh_send_block_time have a default value when not called? why would this occur only on a corner case?

I will try to put it with 5seconds timeout

KonssnoK commented 9 hours ago

i think the issue might have happened and the timeout recovered the situation. Not sure if this is how the system is supposed to work tho....

I-ll try to reproduce the error again

240703dev2_10.txt


I (139081) mesh_mqtt: Other event id:7
I (139091) event: >>>esp_event_loop_run,630

W (139941) mesh: [mesh_schedule.c,3132] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:0, xseqno:141, no_wnd_count:0, timeout_count:0
>>>mesh_xon_process_disconnected,2638, (1, 1, 0)
I (140891) mesh_main: Tried to publish layer:2 IP:10.0.0.3
W (141141) mesh: [mesh_schedule.c,3132] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:0, xseqno:141, no_wnd_count:0, timeout_count:1
>>>mesh_xon_process_disconnected,2638, (1, 1, 0)
W (142341) mesh: [mesh_schedule.c,3132] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:0, xseqno:141, no_wnd_count:0, timeout_count:2
>>>mesh_xon_process_disconnected,2638, (1, 1, 0)
W (143541) mesh: [mesh_schedule.c,3132] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:0, xseqno:141, no_wnd_count:0, timeout_count:3
>>>mesh_xon_process_disconnected,2638, (1, 1, 0)
I (143901) event: >>>esp_event_post_to,848, evt base = WIFI_MESH_EVENT, evt id = 36

I (143911) event: >>>esp_event_post_to,898, send to q

I (143911) event: >>>esp_event_post_to,919

W (144741) mesh: [mesh_schedule.c,3132] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:0, xseqno:141, no_wnd_count:0, timeout_count:4
E (144741) mesh_netif: Send with err code 16394 ESP_ERR_MESH_TIMEOUT
I (144741) esp_netif_lwip: SENDW mesh_link_sta   42 FFFFFFFFFFFF7CDFA1E27B84080600010800060400017CDFA1E27B840A0000030000000000000A000001
>>>mesh_xon_process_disconnected,2638, (1, 1, 0)
I (145571) wifi:state: run -> init (6c0)
>>>ieee80211_sta_new_state,253
>>>ieee80211_sta_new_state,371
>>>cnx_sta_leave,2006
I (145571) wifi:pm stop, total sleep time: 0 us / 9075657 us

I (145571) wifi:<ba-del>idx
I (145571) wifi:new:<11,0>, old:<11,2>, ap:<11,2>, sta:<11,2>,prof:11
>>>ieee80211_sta_new_state,559, post disconnect evt
>>>wifi_event_post,2241
I (145591) event: >>>esp_event_post_to,848, evt base = WIFI_EVENT, evt id = 5

I (145591) event: >>>esp_event_post_to,898, send to q

I (145601) event: >>>esp_event_post_to,919

W (145961) mesh: [mesh_schedule.c,3132] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:0, xseqno:142, no_wnd_count:0, timeout_count:5
>>>mesh_xon_process_disconnected,2638, (1, 1, 0)
W (147161) mesh: [mesh_schedule.c,3132] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:0, xseqno:142, no_wnd_count:0, timeout_count:6
>>>mesh_xon_process_disconnected,2638, (1, 1, 0)
I (147971) mesh: 5162<active>parent layer:1(node), channel:11,rssi:-8, assoc:0(cnx rssi threshold:-120)my_assoc:0
W (148361) mesh: [mesh_schedule.c,3132] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:0, xseqno:142, no_wnd_count:0, timeout_count:7
>>>mesh_xon_process_disconnected,2638, (1, 1, 0)
>>>wifi_event_post,2241
I (149481) event: >>>esp_event_post_to,848, evt base = WIFI_EVENT, evt id = 1

I (149481) event: >>>esp_event_post_to,898, send to q

I (149481) event: >>>esp_event_post_to,919

W (149561) mesh: [mesh_schedule.c,3132] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:0, xseqno:142, no_wnd_count:0, timeout_count:8
>>>mesh_xon_process_disconnected,2638, (1, 1, 0)
W (150761) mesh: [mesh_schedule.c,3132] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:0, xseqno:142, no_wnd_count:0, timeout_count:9
E (150761) mesh_netif: Send with err code 16394 ESP_ERR_MESH_TIMEOUT
I (150761) esp_netif_lwip: SENDW mesh_link_sta   42 FFFFFFFFFFFF7CDFA1E27B84080600010800060400017CDFA1E27B840A0000030000000000000A000003
>>>mesh_xon_process_disconnected,2638, (1, 1, 0)
W (151981) mesh: [mesh_schedule.c,3132] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:0, xseqno:143, no_wnd_count:0, timeout_count:10
>>>mesh_xon_process_disconnected,2638, (1, 1, 0)
W (153181) mesh: [mesh_schedule.c,3132] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:0, xseqno:143, no_wnd_count:0, timeout_count:11
>>>mesh_xon_process_disconnected,2638, (1, 1, 0)
W (153761) ping: From 8.8.8.8 icmp_seq=55 timeout
W (154381) mesh: [mesh_schedule.c,3132] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:0, xseqno:143, no_wnd_count:0, timeout_count:12
>>>mesh_xon_process_disconnected,2638, (1, 1, 0)
W (155581) mesh: [mesh_schedule.c,3132] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:0, xseqno:143, no_wnd_count:0, timeout_count:13
>>>mesh_xon_process_disconnected,2638, (1, 1, 0)
W (156781) mesh: [mesh_schedule.c,3132] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:0, xseqno:143, no_wnd_count:0, timeout_count:14
E (156781) mesh_netif: Send with err code 16394 ESP_ERR_MESH_TIMEOUT
I (156781) esp_netif_lwip: SENDW mesh_link_sta   42 FFFFFFFFFFFF7CDFA1E27B84080600010800060400017CDFA1E27B840A0000030000000000000A000001
>>>mesh_xon_process_disconnected,2638, (1, 1, 0)
W (158001) mesh: [mesh_schedule.c,3132] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:0, xseqno:144, no_wnd_count:0, timeout_count:15
>>>mesh_xon_process_disconnected,2638, (1, 1, 0)
W (159201) mesh: [mesh_schedule.c,3132] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:0, xseqno:144, no_wnd_count:0, timeout_count:16
>>>mesh_xon_process_disconnected,2638, (1, 1, 0)
W (160401) mesh: [mesh_schedule.c,3132] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:0, xseqno:144, no_wnd_count:0, timeout_count:17
>>>mesh_xon_process_disconnected,2638, (1, 1, 0)
W (161601) mesh: [mesh_schedule.c,3132] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:0, xseqno:144, no_wnd_count:0, timeout_count:18
>>>mesh_xon_process_disconnected,2638, (1, 1, 0)
W (162801) mesh: [mesh_schedule.c,3132] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:0, xseqno:144, no_wnd_count:0, timeout_count:19
E (162801) mesh_netif: Send with err code 16394 ESP_ERR_MESH_TIMEOUT
I (162801) event: >>>esp_event_loop_run,640

I (162811) event: >>>esp_event_loop_run,575,base = WIFI_MESH_EVENT, id = 36

I (162821) event: >>>esp_event_loop_run,599

>>>mesh_wifi_event_cb,8045
I (162821) mesh: WIFI_EVENT_MESH_ROUTER, non-root
I (162831) mesh: WIFI_EVENT_MESH_ROUTER, last:WIFI_EVENT_STA_CONNECTED[STOP], reason:2
>>>mesh_wifi_event_cb,8063
>>>mesh_wifi_event_cb,8387, new_state = 44
I (162841) event: >>>esp_event_loop_run,640
KonssnoK commented 8 hours ago

i increased the timeout to 30sec

dev2 stuck in deadlock 240703dev2_11.txt

KonssnoK commented 8 hours ago

more logs, with 30seconds it-s more clear what is happening:

I (1096820) mesh_netif: >>>start_mesh_link_sta,328

dhcp_discover(): dhcp state is DISCOVER
I (1096840) esp_netif_lwip: SENDW mesh_link_sta  350 FFFFFFFFFFFF7CDFA1E16AD808004500015003A70000401175F700000000FFFFFFFF00440043013C725701010600550027D300000000000000000000000000000000000000007CDFA1E16AD80000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000063825363350101390205DC0C096573707265737369663D07017CDFA1E16AD8370401031C06FF00000000000000000000000000000000000000000000000000000000000000000000
>>>mesh_xon_process_disconnected,2638, (1, 1, 0)
W (1098100) mesh: [mesh_schedule.c,3132] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:0, xseqno:1046, no_wnd_count:0, timeout_count:6
>>>mesh_xon_process_disconnected,2638, (1, 1, 0)
I (1098110) mesh_main: Tried to publish layer:2 IP:10.0.0.2
I (1098110) mesh_mqtt: sent publish returned msg_id=38621
I (1098210) event: >>>esp_event_post_to,848, evt base = MQTT_EVENTS, evt id = 7

I (1098210) event: >>>esp_event_post_to,881

I (1098210) event: >>>esp_event_post_to,891, send to q

I (1098210) event: >>>esp_event_post_to,919

I (1098220) event: >>>esp_event_loop_run,575,base = MQTT_EVENTS, id = 7

I (1098230) event: >>>esp_event_loop_run,599

I (1098230) mesh_mqtt: Other event id:7
I (1098240) event: >>>esp_event_loop_run,630

W (1098990) ping: From 8.8.8.8 icmp_seq=382 timeout
W (1099300) mesh: [mesh_schedule.c,3132] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:0, xseqno:1046, no_wnd_count:0, timeout_count:7
>>>mesh_xon_process_disconnected,2638, (1, 1, 0)
I (1100110) mesh_main: Tried to publish layer:2 IP:10.0.0.2
W (1100500) mesh: [mesh_schedule.c,3132] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:0, xseqno:1046, no_wnd_count:0, timeout_count:8
>>>mesh_xon_process_disconnected,2638, (1, 1, 0)
W (1101700) mesh: [mesh_schedule.c,3132] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:0, xseqno:1046, no_wnd_count:0, timeout_count:9
>>>mesh_xon_process_disconnected,2638, (1, 1, 0)
I (1102570) wifi:bcn_timeout,ap_probe_send_start
>>>wifi_event_post,2241
I (1102570) event: >>>esp_event_post_to,848, evt base = WIFI_EVENT, evt id = 21

I (1102580) event: >>>esp_event_post_to,898, send to q

I (1102580) event: >>>esp_event_post_to,919

I (1102860) wifi:AP's beacon interval = 102400 us, DTIM period= 1
I (1102860) event: >>>esp_event_post_to,848, evt base = WIFI_MESH_EVENT, evt id = 36

I (1102860) event: >>>esp_event_post_to,898, send to q

I (1102870) event: >>>esp_event_post_to,919

W (1102900) mesh: [mesh_schedule.c,3132] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:0, xseqno:1046, no_wnd_count:0, timeout_count:10
>>>mesh_xon_process_disconnected,2638, (1, 1, 0)
W (1104100) mesh: [mesh_schedule.c,3132] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:0, xseqno:1046, no_wnd_count:0, timeout_count:11
>>>mesh_xon_process_disconnected,2638, (1, 1, 0)
W (1105300) mesh: [mesh_schedule.c,3132] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:0, xseqno:1046, no_wnd_count:0, timeout_count:12
>>>mesh_xon_process_disconnected,2638, (1, 1, 0)
I (1106490) mesh: 5162<active>parent layer:1(node), channel:11, rssi:1, assoc:0(cnx rssi threshold:-120)my_assoc:0
W (1106500) mesh: [mesh_schedule.c,3132] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:0, xseqno:1046, no_wnd_count:0, timeout_count:13
>>>mesh_xon_process_disconnected,2638, (1, 1, 0)
W (1107710) mesh: [mesh_schedule.c,3132] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:0, xseqno:1046, no_wnd_count:0, timeout_count:14
>>>mesh_xon_process_disconnected,2638, (1, 1, 0)
>>>wifi_event_post,2241
I (1108000) event: >>>esp_event_post_to,848, evt base = WIFI_EVENT, evt id = 1

I (1108000) event: >>>esp_event_post_to,898, send to q

I (1108000) event: >>>esp_event_post_to,919

W (1108910) mesh: [mesh_schedule.c,3132] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:0, xseqno:1046, no_wnd_count:0, timeout_count:15
>>>mesh_xon_process_disconnected,2638, (1, 1, 0)
W (1110110) mesh: [mesh_schedule.c,3132] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:0, xseqno:1046, no_wnd_count:0, timeout_count:16
>>>mesh_xon_process_disconnected,2638, (1, 1, 0)
I (1111120) wifi:state: run -> init (6c0)
>>>ieee80211_sta_new_state,253
>>>ieee80211_sta_new_state,371
>>>cnx_sta_leave,2006
I (1111130) wifi:pm stop, total sleep time: 0 us / 14581097 us

I (1111130) wifi:new:<11,0>, old:<11,2>, ap:<11,2>, sta:<11,2>, prof:11
>>>ieee80211_sta_new_state,559, post disconnect evt
>>>wifi_event_post,2241
I (1111140) event: >>>esp_event_post_to,848, evt base = WIFI_EVENT, evt id = 5

I (1111150) event: >>>esp_event_post_to,898, send to q

I (1111160) event: >>>esp_event_post_to,919

W (1111310) mesh: [mesh_schedule.c,3132] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:0, xseqno:1046, no_wnd_count:0, timeout_count:17
>>>mesh_xon_process_disconnected,2638, (1, 1, 0)
W (1112510) mesh: [mesh_schedule.c,3132] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:0, xseqno:1046, no_wnd_count:0, timeout_count:18
>>>mesh_xon_process_disconnected,2638, (1, 1, 0)
W (1113710) mesh: [mesh_schedule.c,3132] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:0, xseqno:1046, no_wnd_count:0, timeout_count:19
>>>mesh_xon_process_disconnected,2638, (1, 1, 0)
W (1114910) mesh: [mesh_schedule.c,3132] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:0, xseqno:1046, no_wnd_count:0, timeout_count:20
>>>mesh_xon_process_disconnected,2638, (1, 1, 0)
W (1116110) mesh: [mesh_schedule.c,3132] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:0, xseqno:1046, no_wnd_count:0, timeout_count:21
>>>mesh_xon_process_disconnected,2638, (1, 1, 0)
W (1117310) mesh: [mesh_schedule.c,3132] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:0, xseqno:1046, no_wnd_count:0, timeout_count:22
>>>mesh_xon_process_disconnected,2638, (1, 1, 0)
W (1118510) mesh: [mesh_schedule.c,3132] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:0, xseqno:1046, no_wnd_count:0, timeout_count:23
>>>mesh_xon_process_disconnected,2638, (1, 1, 0)
W (1119710) mesh: [mesh_schedule.c,3132] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:0, xseqno:1046, no_wnd_count:0, timeout_count:24
>>>mesh_xon_process_disconnected,2638, (1, 1, 0)
W (1120910) mesh: [mesh_schedule.c,3132] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:0, xseqno:1046, no_wnd_count:0, timeout_count:25
>>>mesh_xon_process_disconnected,2638, (1, 1, 0)
W (1122110) mesh: [mesh_schedule.c,3132] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:0, xseqno:1046, no_wnd_count:0, timeout_count:26
>>>mesh_xon_process_disconnected,2638, (1, 1, 0)
W (1123310) mesh: [mesh_schedule.c,3132] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:0, xseqno:1046, no_wnd_count:0, timeout_count:27
>>>mesh_xon_process_disconnected,2638, (1, 1, 0)
W (1124510) mesh: [mesh_schedule.c,3132] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:0, xseqno:1046, no_wnd_count:0, timeout_count:28
>>>mesh_xon_process_disconnected,2638, (1, 1, 0)
W (1125710) mesh: [mesh_schedule.c,3132] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:0, xseqno:1046, no_wnd_count:0, timeout_count:29
>>>mesh_xon_process_disconnected,2638, (1, 1, 0)
W (1126910) mesh: [mesh_schedule.c,3132] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:0, xseqno:1046, no_wnd_count:0, timeout_count:30
>>>mesh_xon_process_disconnected,2638, (1, 1, 0)
W (1128110) mesh: [mesh_schedule.c,3132] [WND-RX]max_wnd:2, 1200 ms timeout, seqno:0, xseqno:1046, no_wnd_count:0, timeout_count:31
E (1128110) mesh_netif: Send with err code 16394 ESP_ERR_MESH_TIMEOUT
I (1128110) mesh_netif: >>>start_mesh_link_sta,330

I (1128120) mesh_netif: >>>mesh_netifs_start,460

I (1128120) event: >>>esp_event_post_to,848, evt base = IP_EVENT, evt id = 1

I (1128120) event: >>>esp_event_loop_run,640

I (1128130) event: >>>esp_event_post_to,898, send to q

240703dev3_12.txt

KonssnoK commented 8 hours ago

@zhangyanjiaoesp how should we tackle this ?

KonssnoK commented 7 hours ago

for reference, i tried the same patches but with our own dynamic/fixed root code and at times it's impossible to form a dynamic mesh after building a fixed one with latest library and code. This was working before, we are missing something else.