espressif / esp-nimble

A fork of NimBLE stack, for use with ESP32 and ESP-IDF
Apache License 2.0
76 stars 50 forks source link

ESP32/nimble forgets BLE pairing connection to Android client (nasty 'fix' attached) #16

Open geeksville opened 3 years ago

geeksville commented 3 years ago

Hi,

I recently noticed one of my nodes was discarding formerly working ble bonded connections to my Android phone. This failure occurred after the following error message/clue was emitted in the logs "failed to configure restored IRK". Searching for that string and then adding some debugging led me to the following lines in ble_hs_resolv_list_add() as the culprit.

    if (ble_hs_is_on_resolv_list(ident_addr, addr_type)) {
        return BLE_HS_EINVAL;
    }

Note: for my application I teardown and then restart the BLE stack fairly often, because my device enters light-sleep and I want to turn off bluetooth during that period. This code and the nimble fixes in https://github.com/espressif/esp-idf/issues/5530 had been working quite nicely for over a month with a couple thousand shipped devices on my open-source project (www.meshtastic.org). This failure started to occur for only one of my personal test devices.

Not knowing much at all about the innards of this stack my nasty hack in this case is to replace "if there is already an entry found for addr X, fail" with "if there is already an entry found for addr X, replace it"

I have no idea if this hack is a bad idea, but I thought I'd include it here for comment.

(cc @bfriedkin because it sounds like you are doing similar things to me, so if you encounter this bug it might be relevant for you also)

My private debug output listed below:
    BLE task running
    registered service 0x1800 with handle=1
    registered service 0x1801 with handle=6
    registered service 6ba1b218-15a8-461f-9fa8-5dcae273eafd with handle=10
    registered service cb0b9a0b-a84c-4c0d-bdbb-442e3144ee30 with handle=18
    failed to configure restored IRK
    BLE advertisting type=0, Private=0, Device Address: 39:24:62:ab:dd:df
geeksville commented 3 years ago

cc @bfriedkin - just in case you also encounter this.

bfriedkin commented 3 years ago

@geeksville -

Thank you for keeping me in the loop. I haven't seen what you describe (most of my testing has been on iPhone), but it is good to know about this nonetheless.

Regards, Brian

prasad-alatkar commented 3 years ago

@geeksville Can you please share for which Android device did you test ? If possible can you please share any simple steps to follow which can help me reproduce the issue ? From your description, it looks to be kind of very interesting scenario. The reason ble_hs_is_on_resolv_list returns BLE_HS_EINVAL because ble_hs_resolv_list_add is called for peer already existing in Resolving list i.e. IRK already present with our device. However if I am right with my first cut analysis regarding the issue then we may have to look for better fix than commenting out that section of code :) Can you please provide me debug logs for the issue with attached NimBLE submodule patch : NimBLE_forgets_bonded_peers_check_IRK_changed.txt ? You may need to apply the patch at $IDF_PATH/components/bt/host/nimble/nimble. You may need to enable debug logs from menuconfig i.e. make menuconfig --> Component Config --> Log output --> Default log verbosity to Debug.

geeksville commented 3 years ago

Hi Prasad,

I suspect it might be related to my particular use case. For our device we frequently have the esp32 enter light-sleep (after a few minutes with no comms needed to phone). So to prepare for that we fully (?) teardown the Nimble BLE stack - so that BLE is off. And then when the device comes back we start nimble back up. On some devices (the oldest ones?) this problem started to happen. Alas, I'm moving to Taiwan soon and my computers are now mostly packed away. Once my mini lab is going again (Approx Oct 5th) I'll happily try to collect the information you requested (and try to isolate the problem a bit more).

On Tue, Sep 22, 2020 at 9:21 AM Prasad Alatkar notifications@github.com wrote:

@geeksville https://github.com/geeksville Can you please share for which Android device did you test ? If possible can you please share any simple steps to follow which can help me reproduce the issue ? From your description, it looks to be kind of very interesting scenario. The reason ble_hs_is_on_resolv_list returns BLE_HS_EINVAL because ble_hs_resolv_list_add is called for peer already existing in Resolving list i.e. IRK already present with our device. However if I am right with my first cut analysis regarding the issue then we may have to look for better fix than commenting out that section of code :) Can you please provide me debug logs for the issue with attached NimBLE submodule patch : NimBLE_forgets_bonded_peers_check_IRK_changed.txt https://github.com/espressif/esp-nimble/files/5262732/NimBLE_forgets_bonded_peers_check_IRK_changed.txt ? You may need to apply the patch at $IDF_PATH/components/bt/host/nimble/nimble. You may need to enable debug logs from menuconfig i.e. make menuconfig --> Component Config --> Log output --> Default log verbosity to Debug.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/espressif/esp-nimble/issues/16#issuecomment-696829179, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABXB2JJAKNJZSXRAQOVJFTSHDFHNANCNFSM4RDJFSFQ .

prasad-alatkar commented 3 years ago

Thanks @geeksville , meanwhile I will try to catch the issue myself.

prasad-alatkar commented 3 years ago

Hi @geeksville I tried to reproduce the issue with some forced setup. Though I was able to get the error prints (failed to configure restored IRK), I did not see any issue with re-bonding with earlier bonded peers. Just to give a little background, we use Host based privacy and not Controller based privacy. Now for controller based privacy, when host stack syncs with controller (init-deinit of BLE stack results in multiple syncs ), the IRKs are supposed to be restored in Controller. Now as we use Host based privacy, the above debug/error prints just indicate that the IRKs for the peer are already present in Resolving list so ble_hs_resolv_list_add returns failure. IMO, this should not have any material impact on re-bonding with peers. If possible, kindly share debug logs for the issue, this will help me root-cause the issue if any.

sicanins commented 3 years ago

Hi, not sure if this helps. I was also trying to stop and restart the stack and came across these error messages. They dont do any harm as expected. But as I did not know this, i tried to get rid of them, adding a ble_hs_resolv_list_clear_all(); after shutting it down. This cleared the error message. But I had to find out, that it left the persistent key storage in some invalid state that blocked any further connections. Didnt dig deep, just deleted the store completely and it worked again. But I guess that there might be some check needed to detect a corrupted key storage.

Just deleting all peers with ble_store_util_delete_peer() did not resolve the problem!

Hope this helps, Simon

prasad-alatkar commented 3 years ago

Hi @sicanins apologies for very delayed response, as I understand from your reply keeping these error messages as it is (i.e. ignoring them) do no harm, that is what I had mentioned in above reply too.

@geeksville , Can you please take a look at my above reply ? If you still face the issue, please try to provide some logs as mentioned earlier. Thanks.

h2zero commented 3 years ago

@prasad-alatkar I have found this issue as well.

Steps to reproduce:

@geeksville's patch does indeed correct this issue, which seems to be caused by a mismatch of addresses after the above procedure.

ChrisEAlfred commented 2 years ago

It has been several months. Has this been fixed yet? Can we have an updated bleprph example which handles RPA? RPA seems to be mandatory on iOS (I think).

VNovytskyi commented 1 year ago

Hello! I also have the same error. I added the following code to the NimBLEServer.cpp: image It fixes the problem, but you must enter the PIN code twice. If someone knows how to fix it, please tell me.