espressif / esp-idf

Espressif IoT Development Framework. Official development framework for Espressif SoCs.
Apache License 2.0
13.34k stars 7.2k forks source link

[mesh] ESP32S3 v4.4 Fail to connect to WIFI when moving to dynamic root (IDFGH-12011) #13076

Open KonssnoK opened 7 months ago

KonssnoK commented 7 months ago

Answers checklist.

General issue report

Hello @zhangyanjiaoesp ,

as you might remember our system moves between fixed root and dynamic root depending on WIFI connectivity.

One thing we noticed is the following:

I (10647) mesh: [SCAN][ch:1]AP:3, other(ID:0, RD:0), MAP:0, idle:0, candidate:1, root:0, topMAP:0[c:0,i:1][d6:92:7d:c3:02:1e]router found<>
I (10649) mesh: 1330[SCAN]init rc[7c:df:a1:ff:a5:15,-38], mine:0, voter:0
I (10656) mesh: [SCAN:10/10]rc[128][7c:df:a1:ff:a5:15,-38], self[7c:df:a1:ff:a5:14,-38,reason:0,votes:1,idle][mine:1,voter:1(1.00)percent:1.00][128,1,7c:df:a1:ff:a5:15]

I (10671) mesh: [DONE]connect to router:KI, channel:1, rssi:-38, d6:92:7d:c3:02:1e[layer:0, assoc:0], my_vote_num:1/voter_num:1, rc[7c:df:a1:ff:a5:15/-38/1]

Instaead, if i move to LTE (fixed root) and then switch on the WIFI after a while, I see the following

I (545720) mesh: [SCAN][ch:0]AP:11, other(ID:0, RD:0), MAP:0, idle:0, candidate:0, root:0, topMAP:0[c:0,i:0][00:00:00:00:00:00]<>
I (545721) mesh: [FAIL][31]root:0, fail:31, normal:0, <pre>backoff:0

It takes minutes for the device to be able to reconnect to the wifi once i switched it back on. it usually needs to reach fail:60 to then become rootless and finally reconnect.

We trigger the connection in the second scenario by simply doing:

        ESP_ERROR_CHECK_WITHOUT_ABORT(esp_mesh_set_type(MESH_IDLE));
        esp_mesh_fix_root(false);
        esp_mesh_set_router(&self->mesh_router);
        esp_mesh_set_self_organized(true, true);

I don't understand why this should trigger such a long search process vs the search process of a just woken up device. Are we missing some command?

Thanks

zhangyanjiaoesp commented 6 months ago

@KonssnoK when the device is just woken up, there is no mesh network, it will connect to the mesh network when it finds a suitable parent. But when the device has joined the mesh network, sometimes the parent node will disappear and recover in a short time. In order to maintain the original network structure as much as possible, the device will generally scan the parent node on the original channel for a period of time, and only when it is finally confirmed that the parent node can no longer connect, it will look for other appropriate parents in all channels.

KonssnoK commented 6 months ago

mmm @zhangyanjiaoesp i think this is not what we are referring to.

We have:

KonssnoK commented 5 months ago

@zhangyanjiaoesp any thoughts?

zhangyanjiaoesp commented 5 months ago

mmm @zhangyanjiaoesp i think this is not what we are referring to.

We have:

  • device boots alone (no other devices to do the mesh)
  • device connects to WIFI immediately
  • device is of course ROOT
  • device loses WIFI
  • device connects to LTE -> Our netif interface is a PPP one. The STA is periodically scanning for networks!
  • 5 minutes pass
  • device (ROOT) sees again the WIFI router to which it was connected (not a mesh station, a router)
  • device (ROOT) takes minutes to reconnect to the router

Now I understand what you're talking about, can you provide the logs from the root sees again the router to the root reconnects to the router, it may help to analysis the reason.

KonssnoK commented 5 months ago

@zhangyanjiaoesp attached a first log, i will generate another with more delay esp_wifi_reconnection.txt

zhangyanjiaoesp commented 5 months ago

@KonssnoK Is this phenomenon (device (ROOT) takes minutes to reconnect to the router) always present? Can you provide me with a demo for testing?

KonssnoK commented 5 months ago

@KonssnoK Is this phenomenon (device (ROOT) takes minutes to reconnect to the router) always present? Can you provide me with a demo for testing?

not always. what i see is that normally the device (which was ROOT) will perform a full scan up to 60 times. Once it reaches 60 retries it will start scanning wifi normally. Problem is that these 60 times scans sometimes take a very long time.

I will try to reproduce it in a small project in the next days

Sherry616 commented 1 month ago

Hi @KonssnoK, could you please share your updates on this issue? Thanks. Would we keep this issue open?

KonssnoK commented 1 month ago

yes, the issue should stay open, the time of scanning is way longer after a disconnection than after a boot. Sorry not many other updates since we are quite busy with other issues :(