espressif / esp-nimble

A fork of NimBLE stack, for use with ESP32 and ESP-IDF
Apache License 2.0
81 stars 51 forks source link

Nimble Hangs when connecting to android 5 (lollipop) BLE 4.0 #13

Open mitchjs opened 4 years ago

mitchjs commented 4 years ago

ESP-IDF - master

when i connect to an android tablet (samsumg tab 7 (2016)) which is BLE 4.0 it connects then hangs when getting LE Read Remote Used Features LOG:

ble_hs_hci_cmd_send: ogf=0x08 ocf=0x0016 len=2 0x16 0x20 0x02 0x00 0x00 Command Status: status=0 cmd_pkts=5 ocf=0x16 ogf=0x8

oddly if i have LOG LEVEL to DEBUG, it will work occasionally but at INFO, never

when it does work:

Command Status: status=0 cmd_pkts=5 ocf=0x16 ogf=0x8 LE Remote Used Features. FAIL (status=26)

prasad-alatkar commented 4 years ago

Hi @mitchjs Can you please provide complete logs from boot up for both INFO and DEBUG prints ? Is it possible for you to share Android side logs as well ?

mitchjs commented 4 years ago

i have to collect the from boot to fail.. (but keep in mind, it connects perfectly everytime to my iphone ios13) its just this older BLE 4.0 device

the log in android testing app, nRFConnect only say GATT_ERROR (0x86) 133 and this is SO common with android

its clear the android doesn respond to the BLE REMOTE FEATURES REQ (for one isnt that a BLE 4.1 thing??)

prasad-alatkar commented 4 years ago

Yes, Error status = 26 translates to BLE_ERR_UNSUPP_REM_FEATURE: Unsupported Remote Feature / Unsupported LMP Feature. If not possible to share log from boot up, please share it from connection request. It would help to understand the exact failure scenario.

mitchjs commented 4 years ago

its like the android doesnt respond to the request, and nimble doesnt time out and move on

using this demo: https://github.com/espressif/esp-idf/tree/master/examples/bluetooth/nimble/bleprph

when DEBUG LOG level is on... it works MOST of time (i did it 5 times, failed 2) stuck at

ble_hs_hci_cmd_send: ogf=0x08 ocf=0x0016 len=2
0x16 0x20 0x02 0x00 0x00
Command Status: status=0 cmd_pkts=5 ocf=0x16 ogf=0x8

attached is log of a successful connection

to me since it sometimes(mostly) works with DEBUG log level on, its a TIMING or THREAD issue

log #1 success log #2 disconnect, then tried to reconnect...it hung

ESP32-NimBLE-log #1.txt ESP32-NimBLE-log #2.txt

i tried it with LOG INFO too, i got it to connect successfully a few times... but still at times hangs up...(way too many) its prob an android 5.0 issue BLE has always been issue on android but maybe something can be done so nimble doesnt hang up

prasad-alatkar commented 4 years ago

From 'ESP32-NimBLE-log 2.text', the disconnect is happening because of TIMEOUT Disconnection Complete: status=0 handle=0 reason=19.
After successfully sending command (BLE_HCI_OCF_LE_RD_REM_FEAT), it waits for ACK from peer for BLE_HCI_CMD_TIMEOUT_MS (2 seconds). In case of working log, peer responds with feature not supported, LE Remote Used Features. FAIL (status=26). Even if we do not get response, there should not be any issue, as it may only result in host stack reset, which will trigger bleprph_on_reset callback in example. By "hang", you mean the NimBLE stack is stuck at somewhere for long duration, right ?

I will try to reproduce the above mentioned issue at my end first and will update you accordingly.

mitchjs commented 4 years ago

Yes hang is it sits there long time... then it does the disconnect

h2zero commented 4 years ago

I don't have an android device to reproduce this, but I have had a look at the related code and stumbled upon an upstream PR #738 that may have an effect.

mitchjs commented 4 years ago

no go... i hacked that change in... in nimble 1.3 the did a lot of nice refactoring... hopefully ESP-IDF version gets updated

prasad-alatkar commented 4 years ago

Hi @h2zero @mitchjs the corresponding fix is already present in esp-nimble commit: c6fa32fc

Yes hang is it sits there long time... then it does the disconnect

If it disconnects with reason code = 531, it translates to remote/peer user terminated connection (BLE_ERR_REM_USER_CONN_TERM). I believe if it does disconnect, it may not be appropriate to call it as 'hung'. I was under impression that it gets hung to not come out of that state, like 'Denial of Service'. However I may need to reproduce the issue first, the only problem is I don't have Android device with above mentioned version with me as of now, will update you as soon as I have some observations to share.

mitchjs commented 4 years ago

well keep in mind it works sometimes... its for sure an issue with android 5 i just purchased an samsung tab A 8 (2019 model) it runs android 9 BLE5.0

and it works perfectly EVERYTIME

where the older samsung tab a 7 (2016 model) android 5 BLE4.0 fails 90% of the time

h2zero commented 4 years ago

the corresponding fix is already present in esp-nimble commit: c6fa32f

@prasad-alatkar thanks for pointing that out, seems you did the same thing only differently.

Anecdotal note about this issue; using an old iphone 5, which I believe is also BLE 4.0 I have not reproduced the problem. Seems to be an android issue.

mitchjs commented 4 years ago

here is with log level at DEBUG (cause it works sometimes) (if at INFO, never works) (i added some extra log stuff)

FAIL:

D NimBLEServerCallbacks: "onConnect(): Default" ble_hs_hci_cmd_send: ogf=0x08 ocf=0x0016 len=2 0x16 0x20 0x02 0x00 0x00 ble_hs_hci_rx_evt()):hci_ev[0]=0x0f - hci_ev[2]=0x00 ble_hs_hci_cmd_tx()): opcode=2016 Command Status: status=0 cmd_pkts=5 ocf=0x16 ogf=0x8

SUCCESS:

D NimBLEServerCallbacks: "onConnect(): Default" ble_hs_hci_cmd_send: ogf=0x08 ocf=0x0016 len=2 0x16 0x20 0x02 0x00 0x00 ble_hs_hci_rx_evt()):hci_ev[0]=0x0f - hci_ev[2]=0x00 ble_hs_hci_cmd_tx()): opcode=2016 Command Status: status=0 cmd_pkts=5 ocf=0x16 ogf=0x8 ble_hs_hci_rx_evt()):hci_ev[0]=0x3e - hci_ev[2]=0x04 LE Remote Used Features. FAIL (status=26) ble_hs_hci_evt_acl_process(): conn_handle=0 pb=2 len=11 data=0x07 0x00 0x04 0x00 0x10 0x01 0x00 0xff 0xff 0x00 0x28 ...

h2zero commented 4 years ago

@prasad-alatkar @mitchjs,

I was able to reproduce this today.

The cause seems to be the controller commit: a45ff23f. I reverted to the last commit before it: 9ee3b17 and the issue was resolved.

This is obviously something that would need to be brought up to the controller devs.

In the mean time, we can look for a workaround. I know already that commenting out this line eliminates the problem, but that's not a great solution.

We also know that turning on debugging reduces the error frequency, so it seems the controller is overloaded at this point in the code. Maybe a delay or some other measure that will allow the controller time to finish it's tasks before calling the line above?

mitchjs commented 4 years ago

nice, can i ask how you got it to reproduce?

h2zero commented 4 years ago

Funny enough it was Arduino that tipped me off. I was testing my library against the latest arduino-esp32 master with my old iPhone 5 and saw it hanging while connecting. When I reverted it back to the latest release (v1.0.4) the problem was resolved. So using the commit ID's that arduino-esp32 is on I started going through the controller revisions and found the offender. Turns out it's not just an android problem after all.

mitchjs commented 4 years ago

wow... we really have no idea what is in any of those lib commits other than the limited comment... how can i test that lib, i cant drop it in... its so old and much smaller it will be missing code... older idf? 3.3?

h2zero commented 4 years ago

Yep, the big black box lol, sadly not much that can be done with that other than break out HxD or Ghidra and have some fun.

IDF 3.3 is somewhat of an option, if you use the correct commit, but it will not have NimBLE so you'll have to use my backport.

The best option is to workaround the issue, the question is how? The other best option is to fix the controller, but that may or may not get backported, so I have to assume not.

h2zero commented 4 years ago

It seems, for me at least (with iOS), the problem is resolved in commit 5a2416f, however @mitchjs is still seeing this issue on android.

prasad-alatkar commented 4 years ago

@mitchjs Do you see this issue with Bluedroid as well ? Or is it something very specific to NimBLE ? Can you also try with updated esp-idf, because recently I could see few fixes in controller lib.

mitchjs commented 4 years ago

no bluedroid didnt have this issue

i havent tryied it in a while, i got a newer tablet (android 10 i think) and i got no issues i guess i can try and let you know with lastest master idf

mitchjs commented 4 years ago

just did a fresh pull of ESP-IDF master, and built my quick test of nimble

FAILS (hangs) on Android 5

WORKS on Android 10

so no change

mitch