Closed jamesi8086 closed 4 years ago
On a related note, I'm noticing that this crash happens when we try to restart advertising after the client connection-failure underflow. It does not always happen, suspect there could be some invalid pointer there. I've been leaving the device running 24/7, allow the glitch to happen, and see if it could remain stable even with the underflow glitch.
Update: Ok I don't think this crash is related to the m_connectedCount underflow, seems closer to start/stop advertising repeatedly over long periods of time.
Heres the stack trace:
assertion "head != NULL" failed: file "/home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/heap/multi_heap_poisoning.c", line 214, function: multi_heap_free
0x40092678: invoke_abort at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp32/panic.c line 155 0x400928a9: abort at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp32/panic.c line 170 0x400efaaf: __assert_func at ../../../.././newlib/libc/stdlib/assert.c line 63 0x40092305: multi_heap_free at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/heap/multi_heap_poisoning.c line 214 0x40085862: heap_caps_free at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/heap/heap_caps.c line 268 0x40085e1d: _free_r at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/newlib/syscalls.c line 42 0x40174399: operator delete(void) at /builds/idf/crosstool-NG/.build/src/gcc-5.2.0/libstdc++-v3/libsupc++/del_op.cc line 46 0x40174721: operator delete[](void) at /builds/idf/crosstool-NG/.build/src/gcc-5.2.0/libstdc++-v3/libsupc++/del_opv.cc line 32 0x400d42b5: BLEAdvertising::start() at C:\Users\nexuslord\AppData\Local\Arduino15\packages\esp32\hardware\esp32\1.0.4\libraries\BLE\src\BLEAdvertising.cpp line 227 0x400d23e5: _OT_ProtocolV2::advertising_start() at C:\Users\NEXUSL~1\AppData\Local\Temp\arduino_build_532939\sketch\opentracev2.cpp line 185 0x400d1e6a: loop() at E:\Dropbox\Git Workspace\tracestick\firmware_alpha\alpha/alpha.ino line 88 0x400dfce1: loopTask(void*) at C:\Users\nexuslord\AppData\Local\Arduino15\packages\esp32\hardware\esp32\1.0.4\cores\esp32\main.cpp line 19 0x4008edb5: vPortTaskWrapper at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/freertos/port.c line 143
bump, this is an active bug
@me-no-dev
@yongkimleng can you please enable the core debug to "Debug" and do that same thing again? Unfortunately the person who wrote and maintained the BLE lib in Arduino is no longer around. I had a look at the code, but the flow of events is not very clear to me. Debug will help track what is going on. I imagine that a server event should not be received for a client, it could be an IDF issue, or it could be some miss-configuration in the code.
Unfortunately the person who wrote and maintained the BLE lib in Arduino is no longer around.
Unfortunately that part of code has been wrote by me. It was intended to support multiple connections.
@chegewara any clues on a fix?
I will look into it.
@yongkimleng can you please enable the core debug to "Debug" and do that same thing again? Unfortunately the person who wrote and maintained the BLE lib in Arduino is no longer around. I had a look at the code, but the flow of events is not very clear to me. Debug will help track what is going on. I imagine that a server event should not be received for a client, it could be an IDF issue, or it could be some miss-configuration in the code.
Thats what I did when I took the console output, but did not see any additional output even after enabling debug in the IDE. I can again later tomorrow.
@yongkimleng can I bother you to try a small change? https://github.com/espressif/arduino-esp32/blob/master/libraries/BLE/src/BLEServer.cpp#L167-L176
change to:
case ESP_GATTS_DISCONNECT_EVT: {
if (m_pServerCallbacks != nullptr) { // If we have callbacks, call now.
m_pServerCallbacks->onDisconnect(this);
}
if(m_connId == ESP_GATT_IF_NONE){
return;
}
removePeerDevice(param->disconnect.conn_id, false);
m_connectedCount--; // Decrement the number of connected devices count.
break;
} // ESP_GATTS_DISCONNECT_EVT
sorry wrong lines linked: https://github.com/espressif/arduino-esp32/blob/master/libraries/BLE/src/BLEServer.cpp#L204-L212
Applied this fix, leaving it to run overnight
Hmm nope the problem still occurs
I think that ESP_GATTS_CONNECT_EVT isn't even triggered in such a case, but ESP_GATTS_DISCONNECT_EVT is. But looking at the code, I'm not sure where I could add any connected-flag checks.
BLEClient waiting for disconnected Device disconnected from BLE Devices connected: 0 Devices connected: 0 Devices connected: 0 Devices connected: 0 Devices connected: 0 Devices connected: 0 5e:fe:cb:54:a4:ee rssi: -59 Connecting to ble device Device disconnected from BLE Client connection failed Devices connected: 65535 Devices connected: 65535 Devices connected: 65535 Devices connected: 65535 Devices connected: 65535 Devices connected: 65535 5e:fe:cb:54:a4:ee rssi: -65 Connecting to ble device Device connected to BLE Getting service
Ok I modified removePeerDevice to return true if element is found and removed, so that we can at least do a check. Testing now.
// only decrement if connection is found in map and removed
// sometimes this event triggers w/o a valid connection
if(removePeerDevice(param->disconnect.conn_id, false)) {
m_connectedCount--; // Decrement the number of connected devices count.
}
If there is no ESP_GATTS_CONNECT_EVT
then m_connId = ESP_GATT_IF_NONE;
https://github.com/espressif/arduino-esp32/blob/master/libraries/BLE/src/BLEServer.cpp#L32
In that case this should cause return before counter is decreased:
if(m_connId == ESP_GATT_IF_NONE){
return;
}
Please make sure that code is using library you are changing and testing code, maybe add additional log to print m_connId?
I'm not sure on the underlying code, why are we looking at BLEServer? Does every BLEClient create a new instance of BLEServer for every peer?
Yeah its rebuilding when I change the library code, patched a BLEClient destructor bug yesterday as commented in the other thread about it
I tried using log_e or log_v but Verbose mode locks up my M5StickC on boot, and Debug mode does not get me any output from the lower level logs.
I notice that m_connId
is set in ESP_GATTS_CONNECT_EVT but its never cleared. So your check would never return ESP_GATT_IF_NONE after the first connection is established.
For each of my devices which has both BLEServer and BLEClients,
Yes, it is possible if you are reconnecting to the same BLEClient without creating new instance. As far as i remember when BLEClient is disconnected its instance should be destroyed and free memory.
I'm not sure on the underlying code, why are we looking at BLEServer? Does every BLEClient create a new instance of BLEServer for every peer?
because you are having problem with m_connectedCount
which exists only in BLEServer to track how many clients are currently connected.
Each of my devices function both as a server and client (for p2p). I do recreate BLEClients per connection attempt.
Oddly, the BLEClient's connection/disconnection does trigger the event which occurs in the BLEServer on its own device.
To help with such situations there is implemented set of custom gap, gatts and gattc callbacks. Those callbacks are receiving raw evens straight from bluedroid task and can be used to debug or to add special functions not implemented in library: https://github.com/nkolban/esp32-snippets/issues/711#issuecomment-436278247
You can filter gatts anc gattc to see only disconnect events that should help you more.
I have had similar issues here. I think I have found the root cause - the gattc_if attribute is not stored properly upon client registration. I'm working on a fix that seems to solve it for good.
On further investigation, it appears that when a client gets a CONNECT or DISCONNECT event, then the callback BLEDevice::gattClientEventHandler gets triggered for every client even if that client was not being (dis)connected at the time.
For example, if I had two clients running, one with esp_gatt_if=4 and the other with esp_gatt_if=5, then if client 4 got disconnected, I would get two disconnect events, one for esp_gatt_if=4 and another for esp_gatt_if=5.
Shouldn't the callback only be called once, for the specific client whose connection status changed?
I have worked around it by matching on conn_id in the BLEClient callback handler, though I suspect that's not the full solution - perhaps the ESP SDK might be buggy..?
[STALE_SET] This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.
I did not have time to further improve this, is anyone aware of newer fixes?
[STALE_CLR] This issue has been removed from the stale queue. Please ensure activity to keep it openin the future.
[STALE_SET] This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.
[STALE_DEL] This stale issue has been automatically closed. Thank you for your contributions.
Hardware:
Board: M5Stick-C Core Installation version: 1.0.4 IDE name: Arduino IDE Flash Frequency: 40Mhz PSRAM enabled: no Upload Speed: 1500000 Computer OS: Win 10
Description:
I have an initialized BLE Server and BLE Client on the same device. BLE Client connects and disconnects repeatedly to another device e.g. my android phone.
When the Client fails to connect, a ESP_GATTS_DISCONNECT_EVT is somehow triggered, decrementing m_connectedCount. This should not happen, as m_connectedCount was not incremented.
Symptom is that m_connectedCount is now 1 less than expected, and potentially underflowing to 65535.
Reproducibility happens rarely, probably only during the boundary where the phone changes its MAC address and the ESP32 is just about to read it. We can increase the chance of this happening by turning off the radio at the right time, during the connecting phase.
Sketch: (leave the backquotes for code formatting)
Debug Messages:
Recompiled with debug level: Debug, but I don't see any debug messages.
rst:0x1 (POWERON_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT) configsip: 188777542, SPIWP:0xee clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00 mode:DIO, clock div:1 load:0x3fff0018,len:4 load:0x3fff001c,len:1044 load:0x40078000,len:8896 load:0x40080400,len:5816 entry 0x400806ac ;⸮⸮U⸮⸮⸮2-hal-cpu.c:178] setCpuFrequencyMhz(): PLL: 320 / 4 = 80 Mhz, APB: 80000000 Hz M5StickC initializing...OK Devices connected: 0 Devices connected: 0 Devices connected: 0 ... Devices connected: 0 Devices connected: 0 Devices connected: 0 5a:62:51:40:dd:99 rssi: -65 Device connected to BLE BLE central Send: {"id":"8Vej+n4NAutyZlS1ItKDL//RcfqWP/Tq/T/BBBUOsmAF0U+TGBqd2xcMhpfcSOyN1cSGN3znSGguodP+NQ==","v":2,"o":"SG_MOH","mc":"TraceStick V0.1","rs":-65} BLE central Recv: {"mp":"Pixel 3a XL","v":2,"id":"SElnQHsPCfc/TGSQ6kR0sdTRS07g1KABGxRwcBzAjB6C884CIVB89SIdiwiDMBCbL9wCEKMaDaXOQQKMvg==","o":"SG_MOH"} Device disconnected from BLE Devices connected: 0 Devices connected: 0 ... Devices connected: 0 Devices connected: 0 Devices connected: 0 47:aa:ac:6c:32:ce rssi: -55 Device disconnected from BLE Client connection failed Devices connected: 65535 Devices connected: 65535 ... Devices connected: 65535 Devices connected: 65535 Devices connected: 65535 7f:46:54:44:b6:57 rssi: -71 Device disconnected from BLE Client connection failed Devices connected: 65534 Devices connected: 65534 Devices connected: 65534