espressif / esp-idf

Espressif IoT Development Framework. Official development framework for Espressif SoCs.
Apache License 2.0
13.89k stars 7.33k forks source link

WiFi AP scan fails & disconnects on latest 5.1 release (after recent WiFi tweaks) (IDFGH-12734) #13717

Open JonathanDotCel opened 7 months ago

JonathanDotCel commented 7 months ago

Answers checklist.

IDF version.

v5.1.3-416-gd23b7a0361

Espressif SoC revision.

ESP32 S3

Operating System used.

Windows

How did you build your project?

Command line with idf.py

If you are using Windows, please specify command line type.

PowerShell

Development Kit.

ESP32S3

Power Supply used.

External 3.3V

What is the expected behavior?

Around March 11 2024 a user could connect the ESP with their phone in AP mode and retrieve a list of APs to choose and connect to:

IDF Commit: d22c95a - Merge branch 'refactor/protocomm_public_hdr_v5.1' into 'release/v5.1' LIB commit: 0a25c10 - fix(wifi): fix pmk invalid lead same ssid wpa and wpa2 connect fail issue(0016c4d3)

What is the actual behavior?

Using the following code: esp_wifi_scan_start(NULL, true); esp_wifi_scan_get_ap_records(&number, apRecords); esp_wifi_scan_get_ap_num(apCount);

IDF Commit: 2541101 - fix(wifi): fix some esp32c6 wifi bugs (65149d2e) ~ LIB Commit: c50813b - feat(esp_wifi): Refactor and improve FTM code (c5dfcde7)

Steps to reproduce.

Debug Logs.

Silent fail.
- no logs
- no stack trace
- no reboot
- no memory issues

More Information.

Due to the the wifi using a precompiled library in the LIB submodule, I can't do a git bisect to find the exact code change. Also, due to API changes, I can't revert the LIB alone to test. Setting the home channel dwell value makes no difference. Changing the min/max channel scan time values makes no difference. Rolling back to the mentioned commit + submodule commits fixes the issue immediately.

Xiehanxin commented 7 months ago

hi @JonathanDotCel , it might because that when you use _esp_wifi_scan_get_aprecords, it will clean the storage after getting record, then _esp_wifi_scan_get_apnum returns no result. You can try use _esp_wifi_scan_get_apnum firstly and then use _esp_wifi_scan_get_aprecords

JonathanDotCel commented 7 months ago

Thanks, that fixed it.

Sidenote:

examples/wifi/scan/main/scan.c will need to be updated

The other examples seem unaffected.

J

chipweinberger commented 7 months ago

is there a reason you clear it?

this breaks my code too.

if there no good reason, the old behavior is better

JonathanDotCel commented 7 months ago

is there a reason you clear it?

this breaks my code too.

if there no good reason, the old behavior is better

I see where they're coming from, it kinda makes sense

if ( count > 0 ){ GetTheListAndClearIt(); }

It's quite clean, and less error prone -- but only so long as the documentation and examples support the change.

filzek commented 7 months ago

so, from

ESP_ERROR_CHECK(esp_wifi_scan_start(NULL, true));
ESP_ERROR_CHECK(esp_wifi_scan_get_ap_records(&ap_info_cmd_number, ap_info_cmd));
ESP_ERROR_CHECK(esp_wifi_scan_get_ap_num(&ap_info_count));

to

ESP_ERROR_CHECK(esp_wifi_scan_start(NULL, true));
ESP_ERROR_CHECK(esp_wifi_scan_get_ap_num(&ap_info_count));
ESP_ERROR_CHECK(esp_wifi_scan_get_ap_records(&ap_info_cmd_number, ap_info_cmd));

????
AxelLin commented 7 months ago

hi @JonathanDotCel , it might because that when you use _esp_wifi_scan_get_aprecords, it will clean the storage after getting record, then _esp_wifi_scan_get_apnum returns no result. You can try use _esp_wifi_scan_get_apnum firstly and then use _esp_wifi_scan_get_aprecords

@Xiehanxin

This is a breaking change and it causes regressions as you can see several reports here. So would you fix it or would you document it?

chipweinberger commented 7 months ago

yes, this is the kind of change that makes people scared to upgrade, especially for a point release

should be in a v5.1.3 migration guide at the least

AxelLin commented 7 months ago

See 26a2f687e7b57fed00b562f899aed8ff0919eca3, this also causes regression in the example code. My question is why the fix is in the example code rather than prevent this regression by revert the breaking change. (Fix the example code means everyone that copy the code from original example code needs update application code) And even the commit log of the fix in the example code does not explain why it stop working and need code change.

chipweinberger commented 6 months ago

maybe we should rename it to esp_wifi_scan_get_and_clear_ap_records().

This way users will have a compiler error and know to fix their code

@igrr I know you like to weigh in on public facing API decisions

Thanks for all your great work on this library

pyjamasam commented 1 month ago

This change exists in IDF version 4.4.8 as well. The fix to ensure you get the count before getting the records fixes things.