Open TD-er opened 9 months ago
I'm looking into this code a bit more and there are lots of really strange things here.
For example:
void * WiFiScanClass::_getScanInfoByIndex(int i)
{
if(!WiFiScanClass::_scanResult || (size_t) i >= WiFiScanClass::_scanCount) {
return 0;
}
return reinterpret_cast<wifi_ap_record_t*>(WiFiScanClass::_scanResult) + i;
}
A very tricky way to not check for i < 0
.
One of the return statements in WiFiScanClass::scanNetworks(...)
:
if(WiFiGenericClass::waitStatusBits(WIFI_SCAN_DONE_BIT, 10000)){
return (int16_t) WiFiScanClass::_scanCount;
}
So this clearly shows it should be considered a signed int, as does int16_t WiFiScanClass::scanComplete()
I can't find what has been changed recently, so no idea why it is now completely unusable where it was running fine on previous ESP-IDF5.1 builds I made. The code I mentioned here doesn't seem to have changed in years.
Edit:
Forgot to mention, but it might also be a memory leak, as there is no check to see if the array _scanResult
wasn't already allocated.
Not sure if it can be used on array allocations, but isn't the simplest fix to just do new (std::nothrow)
here?
@SuGlider Can you please take a look as you are now investigation WiFi changes? Thanks
@TD-er - Please provide a sketch that I can use to reproduce this issue.
It is not clear to me how and why you need to use _scanDone()
in your sketch.
This fuction will be called from the WiFi Event Callback only when it gets ARDUINO_EVENT_WIFI_SCAN_DONE
.
Therefore, the number _scanResult
is positive or zero. Never negative.
It is being called when the scan is done. I can't not call it.
It's called from:
esp_err_t WiFiGenericClass::_eventCallback(arduino_event_t *event)
with event ID ARDUINO_EVENT_WIFI_SCAN_DONE
Sorry, you replied fast... :-) -- I was editing my reply - including exactly what you wrote. Anyway I can't reproduce the issue.
I need some sketch that I can use to reproduce this issue and find out what could be its cause.
This fuction will be called from the WiFi Event Callback only when it gets
ARDUINO_EVENT_WIFI_SCAN_DONE
. Therefore, the number_scanResult
is positive or zero. Never negative.
But what if the scan is called again while busy or failed? Then it is assigned a negative value, stored in an unsigned int, thus the argument for the new call can request a really large array. On the ESP32-C2 this is very likely to fail due to its more limited amount of memory.
I have to think about some short sketch to reproduce, as you probably don't think "the entire ESPEasy project" is "a short sketch" ;)
In my opinion, _scanDone()
should never be called by the user side (sketch). It is meant to be called only when the WiFi Event Handler needs to fill up the list of scanned SSIDs.
IF the sketch calls it, the code must make sure that the previous array was deleted, freeing the memory, and that a previous scan process has succeeded.
I don't call it, it is called from the Arduino callback function.
Only thing I did was register events so I can act on those events too.
But _scanDone()
is not something I call.
Can you point me out the code in the ESPEasy project where WiFi Scan takes place?
Reading the ESPEasy Project code, I see that the issue seems to be related to Async WiFi Scan...
Can you point me out the code in the ESPEasy project where WiFi Scan takes place?
It happened while working on the PR for IDF5.1 So this is on this branch: https://github.com/TD-er/ESPEasy/blob/2dbb2f770b208ee623a04ced081f8d366e91f2ba/src/src/ESPEasyCore/ESPEasyWifi.cpp#L982
Reading the ESPEasy Project code, I see that the issue seems to be related to Async WiFi Scan...
Ehh wow, you read through all that WiFi code in under 6 minutes? Impressive!
Edit: Now you'll understand why I am so keen on having a major rewrite/restructure on the Arduino WiFi/Eth/Network code base. So I can get rid of this extremely complex legacy of 5+ years of working around very odd WiFi/Network issues and start over making a lean and clean network interfacing class for ESPEasy.
I just thought of this recent change I made after reporting this issue:
This disconnect code is now wrapped in #if ESP_IDF_VERSION_MAJOR<5
... #endif
but was present when I ran into these issues.
Perhaps this also caused a new call to scan before the previous scan was done?
Or maybe the disconnect also changed something causing the scan to fail -> scancount being -1 or -2 => malloc fail.
Ok, I've read some other pieces of code here and there, but it is not clear why the issue happens. It seems to depend on the sequence of how the WiFi Scan functions are called. All I can say is that I see no issue in the Arduino Core WiFi Scan code. It follows the IDF model.
I may try to help it by describing how WiFi Async Scan should work:
1- scanNetworks(false, ....)
will return immediatly with code -1 (WIFI_SCAN_RUNNING).
2- Whenever the IDF WiFi scan is done, the event loop will fill up the array with the list of SSIDs using _scanDone()
3- The code can check if the Scaning is done by getting the status from scanComplete()
4- The possible results from scanNetworks(false, ....)
or scanComplete()
are:
5- From one WiFi Scan to another it is necessary to release the array memory by calling WiFi.scanDelete();
, otherwise the sketch wil leak memory and fail after a while.
But as I mentioned before, the _scanCount
is unsigned, so if you consider negative values, the new wifi_ap_record_t[WiFiScanClass::_scanCount];
call attempts to allocate something very large.
See:
https://github.com/espressif/arduino-esp32/blob/b811ea40875bbfbf8bd62b4038879ac7be2c2e2b/libraries/WiFi/src/WiFiScan.cpp#L109-L121
N.B. This also doesn't check for the current value of the pointer, so it might be possible this also introduces a memory leak.
I can't see a reason for making _scanCount
a negative value (or a very high unsigned value).
There are only two places when scanCount
is changed to a value different than zero, both in _scanDone()
in the line:
esp_wifi_scan_get_ap_num(&(WiFiScanClass::_scanCount));
which will change scanCount
to the number of APs found in last scan, since it is called only after completing a WiFi Scan.
Also in _scanDone()
in the line:
if(!WiFiScanClass::_scanResult || esp_wifi_scan_get_ap_records(&(WiFiScanClass::_scanCount), (wifi_ap_record_t*)_scanResult) != ESP_OK) {
Here is it used as Input and Output.
As Inpout, it defines the Maximum number of entries in the array.
As Output, it will return the actual number of Networks scanned by IDF.
In other words, void WiFiScanClass::_scanDone()
shall only be called after the WiFi Event ARDUINO_EVENT_WIFI_SCAN_DONE
Otherwise the result is unpredictable.
Any scketch that wants to scan WiFi shall folow the right sequence as in the Arduino WiFi example (although it doesn't use the Asynch mode).
A valid code should be like this:
if (WiFi.scanComplete() > 0) _scanDone();
Yep, but like I said, I don't call this _scanDone()
function myself.
But having this right before the actual call to this function is of course already an extra guard.
I need a sketch that can reproduce the issue. At this point I can't dig in. Something I noticed by analyzing the logs is that from => 0x420073ee: WiFiScanClass::_scanDone() at C:/Users/gijsn/.platformio/packages/framework-arduinoespressif32/libraries/WiFi/src/WiFiScan.cpp:113
The Line 113 is not in the _scanDone() code. Maybe PlatformIO has some different core code for it? Maybe if you try to build it and test it using Arduino IDE using Core 3.0.0-alpha3 (latest with IDF 5.1), the error will be gone.
I will try to make a small sketch for it.
The Line 113 is not in the _scanDone() code. Maybe PlatformIO has some different core code for it?
Ehhm are we looking at the same code? This is the master branch of arduino-esp32:
I see... My bad. I don't remember where I looked this line 113.
I'll wait for the sketch that demonstrates the issue. Thanks!
I have created an example that uses Async WiFi Scanning.
I've done a change to the scanComplete()
function.
Do you want to try this change to check if it may fix ESPEasy Project?
https://github.com/espressif/arduino-esp32/pull/8981/files#diff-bace60178f098d80f1b271ad8ffff33f935644dfad66894478bf3a5969dc7b57
in WiFiScan.cpp
file
The issue fixed in scanComplete()
is related to timeout in Async Mode.
It returned an error code, but it should wait a bit more instead.
I think that this change may solve some issue related to ESPEasyCore/ESPEasyWiFi_ProcessEvent.cpp::processScanDone()
Had a few personal issues the past few days, but will now try to see if I can still reproduce and hopefully no longer reproduce when applying your suggestions.
I have created an example that uses Async WiFi Scanning. I've done a change to the
scanComplete()
function.Do you want to try this change to check if it may fix ESPEasy Project? https://github.com/espressif/arduino-esp32/pull/8981/files#diff-bace60178f098d80f1b271ad8ffff33f935644dfad66894478bf3a5969dc7b57 in
WiFiScan.cpp
file
Tried it and it still crashing.
09.060 : (96084) Info : firstLoopConnectionsEstablished
36.764 : (92720) Info : WiFi : Start network scan all channels
43.511 : (84460) Info : WiFi : Scan finished, found: 15
43.520 : (83596) Info : WiFi : Best AP candidate: Lurch4 3C:37:12:AB:0B:F8 Ch:1 (-31dBm)WPA2/PSK (bgn)
43.525 : (83780) Info : WiFi : Disconnect after scan
43.527 : (83780) Info : WiFi : WifiDisconnect()
Guru Meditation Error: Core 0 panic'ed (Load access fault). Exception was unhandled.
Core 0 register dump:
MEPC : 0x4004b292 RA : 0x420f5338 SP : 0x3fcb6950 GP : 0x3fca8ed0
=> 0x4004b292: ?? ??:0
=> 0x420f5338: ieee80211_output_do at ??:?
TP : 0x3fc8d838 T0 : 0x403855ac T1 : 0x4038585a T2 : 0x144f7821
=> 0x403855ac: vPortEnterCritical at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/freertos/FreeRTOS-Kernel/portable/riscv/port.c:323
=> 0x4038585a: xQueueGiveFromISR at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/freertos/FreeRTOS-Kernel/queue.c:1370
S0/FP : 0x3fce0000 S1 : 0x3fce0000 A0 : 0x00000000 A1 : 0x3fcc3b4a
A2 : 0x3fcb697c A3 : 0x3fcb6c04 A4 : 0x3fcb3000 A5 : 0x00000000
A6 : 0x00000004 A7 : 0x00000001 S2 : 0x3fce0000 S3 : 0x3fcb0de4
S4 : 0x0000002a S5 : 0x3fcc3b4a S6 : 0x00000000 S7 : 0x00000000
S8 : 0x3fcbbe74 S9 : 0x00000001 S10 : 0x00000000 S11 : 0x00000000
T3 : 0x0000000a T4 : 0x00000000 T5 : 0x00000000 T6 : 0x00000006
MSTATUS : 0x00001881 MTVEC : 0x40380001 MCAUSE : 0x00000005 MTVAL : 0x00000018
=> 0x40380001: _vector_table at ??:?
MHARTID : 0x00000000
Stack memory:
3fcb6950: 0x3fcb3130 0x0000003c 0x3fcc3b38 0xffffffff 0x00000000 0x3fce0000 0x3fce0000 0x420f531c
=> 0x420f531c: ieee80211_output_do at ??:?
3fcb6970: 0x00000000 0x00000001 0x3fcbbe74 0x00000000 0x3fcbbda0 0x3fcb6b38 0x3c1259bc 0x3c1259b4
3fcb6990: 0x3fcbbda0 0x3fcb6b38 0x3c1259bc 0x3fcbbe74 0x3fcbbd9c 0x3fcc3b38 0x3fcbbcf4 0x420ade78
=> 0x420ade78: low_level_output at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp_netif/lwip/netif/wlanif.c:90
3fcb69b0: 0x0000001c 0x3fcbbd9c 0x0000000e 0x4209b78c 0x3fcbbd9c 0x3fcc3b38 0x00000006 0x420a9e9e
=> 0x4209b78c: mem_malloc at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/lwip/lwip/src/core/mem.c:209
=> 0x420a9e9e: ethernet_output at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/lwip/lwip/src/netif/ethernet.c:319
3fcb69d0: 0x3fcbbda0 0x3fcb6b38 0x3c1259bc 0x3fcbbe74 0x3fcbbd9c 0x3fcc3b38 0x3fcc3b58 0x420a3a98
=> 0x420a3a98: etharp_raw at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/lwip/lwip/src/core/ipv4/etharp.c:1177
3fcb69f0: 0x00000000 0x00000000 0x00000011 0x00000000 0x3fcb3000 0x00000002 0x3fcaf8fc 0x3fcb6b38
3fcb6a10: 0x3fcaf92c 0x3fcbbd9c 0x3fcc6100 0x420a3df8 0x420ac57a 0x420ac5f6 0x3fcb6aa0 0x3fca8ed0
=> 0x420a3df8: etharp_query at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/lwip/lwip/src/core/ipv4/etharp.c:995
=> 0x420ac57a: lwip_standard_chksum at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/lwip/lwip/src/core/inet_chksum.c:150
=> 0x420ac5f6: inet_cksum_pseudo_base at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/lwip/lwip/src/core/inet_chksum.c:269
3fcb6a30: 0x3fcb3000 0x3fcbbda0 0x3fcbbd9c 0x3fcb6b38 0x3fcc6100 0x3fcb6b38 0x3fcbbd9c 0x420a3fe2
=> 0x420a3fe2: etharp_output at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/lwip/lwip/src/core/ipv4/etharp.c:890
3fcb6a50: 0x00000000 0x00000001 0x0001cd58 0x00000000 0x3fcc6100 0x00010c0f 0x3fcc6140 0x420a4f8e
=> 0x420a4f8e: ip4_output_if_opt_src at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/lwip/lwip/src/core/ipv4/ip4.c:1057
3fcb6a70: 0x0000ffff 0x0000013c 0x00000000 0x00000043 0x00000000 0x00000000 0x00000000 0x00000043
3fcb6a90: 0x3fcbbd9c 0x3fcbbda0 0x3fcc6100 0x3fcb6b38 0x3fcc6100 0x3fcc3bdc 0x3fcc6154 0x420a4fec
=> 0x420a4fec: ip4_output_if_src at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/lwip/lwip/src/core/ipv4/ip4.c:876
3fcb6ab0: 0x00000000 0x3fcc3bdc 0x3fcc6154 0x420ac6de 0x3fcc6100 0x3fcc3bdc 0x3fcc6154 0x420a1d24
=> 0x420ac6de: ip_chksum_pseudo at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/lwip/lwip/src/core/inet_chksum.c:392
=> 0x420a1d24: udp_sendto_if_src at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/lwip/lwip/src/core/udp.c:904 (discriminator 4)
3fcb6ad0: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000004 0x3fcc615c 0x00000043 0x3fcc6100
3fcb6af0: 0x3fcb6b38 0x3fcbbd9c 0x3fcc3bdc 0x420a1d9a 0x3fcc1fd4 0x3fcbbd9c 0x3fcb3000 0x420a22ce
=> 0x420a1d9a: udp_sendto_if at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/lwip/lwip/src/core/udp.c:703
=> 0x420a22ce: dhcp_create_msg at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/lwip/lwip/src/core/ipv4/dhcp.c:2013
3fcb6b10: 0x00000004 0x3fcc615c 0x00000005 0x3fcc624c 0x3fcc6100 0x3fcbbd9c 0x3fcc1fd4 0x420a3486
=> 0x420a3486: dhcp_release_and_stop at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/lwip/lwip/src/core/ipv4/dhcp.c:1458
3fcb6b30: 0x00000004 0x00120003 0x010aa8c0 0x00000000 0x00000000 0x00000000 0x00000000 0x0000a900
3fcb6b50: 0x00000004 0x00000003 0x3fcc4eb4 0x00000002 0x3fcb3000 0x3fcbbd9c 0x3fcbbcf4 0x420ad0aa
=> 0x420ad0aa: esp_netif_down_api at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp_netif/lwip/esp_netif_lwip.c:1655
3fcb6b70: 0x00000004 0x00000003 0x3fcc4eb4 0x00000002 0x3fcb3000 0xffffffff 0x3fcba55c 0x420ac96e
=> 0x420ac96e: esp_netif_api_cb at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp_netif/lwip/esp_netif_lwip.c:229
3fcb6b90: 0x3fcb3000 0xffffffff 0x3fcba520 0x4209a55c 0x00000000 0x00000000 0x00000000 0x3fcba520
=> 0x4209a55c: tcpip_thread_handle_msg at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/lwip/lwip/src/api/tcpip.c:203
(inlined by) tcpip_thread at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/lwip/lwip/src/api/tcpip.c:148
3fcb6bb0: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
3fcb6bd0: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5
3fcb6bf0: 0xa5a5a5a5 0xbaad5678 0x00000160 0xabba1234 0x00000154 0x3fcb6a20 0x0000aa5e 0x3fcae4f8
3fcb6c10: 0x3fcae4f8 0x3fcb6c04 0x3fcae4f0 0x00000007 0x3fcb47f8 0x3fcb47f8 0x3fcb6c04 0x00000000
3fcb6c30: 0x00000012 0x3fcb61f4 0x00546974 0x00000000 0x00000000 0x00000000 0x00000000 0x3fcb6bf0
3fcb6c50: 0x00000012 0x00000001 0x00000000 0x00000000 0x00000000 0x3fcb399c 0x3fcb3a04 0x3fcb3a6c
3fcb6c70: 0x00000000 0x00000000 0x00000001 0x00000000 0x00000000 0x00000000 0x420f14c6 0x00000000
=> 0x420f14c6: _cleanup_r at /builds/idf/crosstool-NG/.build/HOST-x86_64-w64-mingw32/riscv32-esp-elf/src/newlib/newlib/libc/stdio/findfp.c:229
3fcb6c90: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
3fcb6cb0: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
3fcb6cd0: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
3fcb6cf0: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
3fcb6d10: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
3fcb6d30: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
@TD-er - This seems to be a new issue. The messages may not come from WiFi layer anymore (as in the original post). It could be something related to an empty IP address or something else.
I'd say that it would need a new investigation and debugging. But it's hard to tell that it is related to a bug in the Arduino Core layer.
It would be better if you could isolate the issue and create a small sketch that can reproduce it.
Board
Any
Device Description
Not related to device, but crash happens on ESP32-C2 and -C3
Hardware Configuration
Not HW related
Version
latest master (checkout manually)
IDE Name
PlatformIO
Operating System
Windows 11
Flash frequency
40MHz
PSRAM enabled
no
Upload speed
115200
Description
I get crashes related to the new-operator when processing results from a WiFi scan It seems to be caused by the code in this function: https://github.com/espressif/arduino-esp32/blob/b811ea40875bbfbf8bd62b4038879ac7be2c2e2b/libraries/WiFi/src/WiFiScan.cpp#L109-L121
As can be seen, there is no check for negative scan results as the type of
_scanCount
is anuint16_t
. However the result of a scan can be negative, so maybe there is some conversion somewhere to this unsigned value and thus resulting in an attempt to allocate 65k elements ofwifi_ap_record_t
, which does fail at least on a C2.Sketch
Debug Message
Other Steps to Reproduce
No response
I have checked existing issues, online documentation and the Troubleshooting Guide