InfiniTimeOrg / InfiniTime

Firmware for Pinetime smartwatch written in C++ and based on FreeRTOS
GNU General Public License v3.0
2.71k stars 926 forks source link

BLE connectivity reliability analysis #207

Closed vbelloir closed 3 years ago

vbelloir commented 3 years ago

I create this issue to sumarize my work concerning the analysis of PineTime BLE connectivity.

DFU over BLE

Nordic Semiconductor provides a desktop tool, called nrfutil, that allows to generate dfu package and to perform DFU over several links (BLE, UART, USB...). That's pretty cool, but it only works with modem DFU protocol, based on Google Protobuf. It doesn't work with Legacy DFU, which is used by PineTime. Old nrfutil version ( under 0.5.2) can generate Legacy DFU package, but cannot perform DFU over BLE. Adafruit nrfutil version used in PineTime mcuboot bootloader repo is based on that version. So, I used dfu.py python script.

I create a small bash script that 10 DFU OTA, and create a log file with the following format: try 8 result 100.0% in 6 minutes and 36 seconds

File should rename in dfu-test.sh, but github doesnt allow this extension. dfu-test.txt

I used this script several times on my PimeTime sealed watch, to update it from 0.12.1 to 0.12.1.

Here is the lastest output file :

try 1 result 55.7% in 0
try 2 result 100.0% in 6 minutes and 38 seconds
try 3 result 100.0% in 6 minutes and 34 seconds
try 4 result 100.0% in 8 minutes and 34 seconds
try 5 result  in 0
try 6 result 100.0% in 7 minutes and 4 seconds
try 7 result 31.2% in 0
try 8 result 100.0% in 6 minutes and 36 seconds
try 9 result  in 0
try 10 result 19.8% in 0

As we can see:

When OTA succeed, I started another one pressing a key. When it failed, I was forced to reset the watch before pressing key.

Script should propably be improved!

Crash with connexion/disconnexion

Purpose of this test is to evaluate the ability of Pinetime (on Infinitime) to perform BLE connexion/deconnexion.

This test is realize with the following bash script (should be rename gatt-test.sh) gatt-test.txt

It is based on gattool, provided by bluez, and allow to:

It can be called like this: ./gatt-test.sh -m D9:DD:1F:7E:64:12 -t 50 -i 120 -m: define device mac address
-t: define number of loop iteration
-i: define interval in second between iteration

It generates a report file called gatt_YYYMMDD_HHMM.log, containing:

Infinitime fw verson:
0.12.42
mac address: FF:AE:EC:DE:38:D3
tries: 50
interval: 120 s
attempt 1 batterry level 46
attempt 2 batterry level 46
attempt 3 batterry level 46
....

if an error occurs, it appears in log file:

attempt 3 batterry level 0 error 16
attempt 4 batterry level 0 error 111

where 16 and 111 are error codes. During my tests, I saw those errosrs:

I used the test on a PR release 0.12.42 and get this log result: gatt_20210213_1840.log

No error in 50 connexions/deconnexions.

PetersOtto commented 3 years ago

As I got my sealed PineTime, I updated it from 0.7 to 0.9 OTA with gadgetbridge. I got the an error at updating. At the next try I touched / scroll on screen every 15% update progress. Since I do so, I never had an error at updating.

Perhaps just luck :)

vbelloir commented 3 years ago

Tonight, I updated first post with a procedure to test connexion/disconnexion.

@JF002 I also performed a test on DFU on your release 0.12.1-testble on my devkit. To be sure that my test is representative, i performed 10 DFU update from 0.12.1 to 0.12.1, and then, made de same with 0.12.1-testble to 0.12.1-testble.

Result for 0.12.1 is there test_20210213_2102.log 6/10

Result for 0.12.1-testble is there test_20210213_2219.log 9/10
The failed attempt may be an error from myself, I'm not sure.

JF002 commented 3 years ago

Thanks @vbelloir for opening this issue, we'll try to gather all the info about these ble issues here :+1: And also thanks for this test script! It's probably faster than manually connecting and disconnecting from the watch using a smartphone!

These last few days, I did many tests :

Aaaand... nothing. I couldn't reproduce the issue, saw nothing wrong in the logs,...

Then, the other day, I tried something else : connect the phone to the watch, disconnect the phone from wifi/data so that it does not receive any notification and go to sleep. In the morning : the watch was disconnected and couldn't reconnect (the ble logo was still displayed even if it was not connected).

I could reproduce that on my devboard connected to a logic analyzer and noticed that the low-level ble task (ble_ll) just stopped running. With further analysis, I discovered that the RADIO IRQ was not triggered at all! That reminded me of an discussion on the developer mailing list of apache mynewt/nimble I had a few month ago (http://mail-archives.apache.org/mod_mbox/mynewt-dev/202006.mbox/%3cCADAkoF5Os4x9A=q7UevuZ-vXJty365wa=oezRqPGvwGfD=t67g@mail.gmail.com%3e) where Andrzej told me that the way the interrupts were handled in InfiniTime might not be good (PRIMASK vs BASEPRI). At that point, I changed 2 macros to use PRIMASK instead of BASEPRI and it improved the ble connection.

I had a look at the code and found out that 2 other macro were still using primask. I also changed them and re-did my the test : no connection drop over more than 10h! I also run ~10 OTA without any issue.

I'll continue the testing today and will probably release this fix!

JF002 commented 3 years ago

InfiniTime 0.13.0 released :)

Avamander commented 3 years ago

Related issue: https://github.com/JF002/Pinetime/issues/70

ashkitten commented 3 years ago

infinitime 0.13.0 helps a lot with this issue for me. the pinetime now never shows itself as connected when it's not. however, after a while (a day or so) gadgetbridge disconnects and will not reconnect unless i reboot the watch, which requires me to remove the device from gadgetbridce and re-add it if i chose to pair it.

i'm on android 10 (omnirom, moto one hyper) with gadgetbridge 0.53.0 from f-droid.

petterhs commented 3 years ago

I'm having same issues as ashkitten

JF002 commented 3 years ago

I did the same observations as @ashkitten and @petterhs : BLE connectivity has improved in 0.13, and never shows connected when it's not. But I also have to reset the watch ~ once a day. Most of the time, when I wake up, my pinetime is disconnected from the phone and cannot reconnect. I noticed that it doesn't advertise anymore.

Once it's rebooted, Gadgetbridge reconnects automatically, it doesn't need to remove/pair with the watch again, though.

stephanlachnit commented 3 years ago

Just for reference, I have the same problem. For me resetting the watch is mot enough, I even need to reboot my phone to connect it again. Also, on the first connect to Gadgetbridge, the time isn't set, I need to reconnect it for that.

davorvr commented 3 years ago

I have a similar problem with disconnects on 0.15, but reconnecting in GadgetBridge works fine. As people have previously mentioned, the disconnects don't start to happen until after a day or so has passed with me too. Connection is very reliable within the first 12-24 hours, and then starts to break.

pfeerick commented 3 years ago

For completeness, I experience this also, with no advertising or bluetooth connection at around 20h09m or a bit longer with 0.16 dev build. Upon reset (>8 second button long press) it immediately starts advertising, and Gadgetbridge will reconnect automatically if set to do so. As there was some mention on discord it seemed more stable if the pinetime was left in the charge cradle, I've just reset one and will edit this comment in 24-48 hours with the results, but I don't have high hopes.

Edit: I forgot to update this comment when I said I would, but having the watch on charge did not effect this at all. It still dropped off after ~20 hours.

muppeth commented 3 years ago

Not going to be very original ,but perhaps adding that when using combination pinephone+phosh(danctnix and amazfish) getting ton of connection issues, where I need to re-pair the watch few times a day. Not sure who's to blame (amazfish or infinitime) yet though, but in case anyone wants me to try things let me know.

vbelloir commented 3 years ago

With XA2, SFOS, Amazfish, INfinitime, I sometimes need to reboot the wtach to get connexion back.

Man-with-Arrow commented 3 years ago

I'm also experiencing this behavior (watch eventually disconnects, requires reboot and re-pair via Gadgetbridge) on 1.0. In Gadgetbridge settings, enabling "Reconnect automatically" seems to help a bit but does not negate the issue.

JF002 commented 3 years ago

Thanks for all these feedbacks :+1: Someone will definitely have to analyze deep into the BLE integration in InfiniTime and NimBLE to find out why the connection is not as stable as it should be...

blutter commented 3 years ago

Just reading about everyone's experience with rebooting the watch to reconnect to their phones... I have found a way that occasionally works to reconnect my Pinetime/phone without rebooting either.

I have found that I can sometimes reconnect Gadgetbridge by starting the nRFConnect app, turning location off then back on, scanning in nRFConnect, then connecting in nRFConnect. Gadgetbridge will connect once the watch is connected in nRFConnect.

Wondering if this works for anyone else.

I'm using a build based in 1.2.0 (commit 883700f)

Avamander commented 3 years ago

I've mentioned this in the chat as well, but those instructions sound to me like you've just actually simply restarted your phone's Bluetooth.

Location can also keep Bluetooth silently enabled if you've enabled "Enchanced location" or whatever it was, it uses BLE beacons for positioning. There are some chipsets out there have WiFi and BT tied together, meaning that you'd have to turn both off for it to actually turn off, so when you tried a bunch of stuff that might've done it as well.

JF002 commented 3 years ago

If someone wants to help analyzing these issues, I wrote an article a while ago that describes my BLE analysis setup (BLE sniffer, HCI logs, logic analyzer,...).

It's not easy to pinpoint exactly what's going on when the connection drops as this issue occurs after 20-24h on my setup.

ajostergaard commented 3 years ago

Somebody somewhere mentioned that they use https://github.com/atc1441/ATCwatch on their PineTime and Bluetooth is rock solid.

Any value in somebody who understands these things taking a closer look at how they did it?

I'd look myself but I'm a little less hardcore than you hardware rockstars.

JF002 commented 3 years ago

I think ATCWatch uses the NRF Softdevice as BLE stack. It's the BLE stack developed by NRF (manufacturer of the NRF52) which is only available as a binary blob (closed source).

InfiniTime, on the other hand, is based on NimBLE, which is an open source BLE stack that supports the NRF52 MCU.

So, yes, having a look could have been a good idea, but without access to the source of its BLE stack, I'm not sure we'll be able to find anything relevant...


I let my devkit run for the whole day and night yesterday, with a BLE sniffer running just next to it:

Next step is trying to analyze the RADIO IRQ and see if it stops at some point...

vbelloir commented 3 years ago

Really difficult to investigate here, because there are lots of configurations and parameter available. For example, I'm using Infintime 1.3.0, with a Sony XA2 phone, running SFOS 4.1, and amazfish app as companion app. My phone is in my pocket, connected to the watch, and realy often (every half hour?), watch receives a notification that amazfish is now connected, which could mean that connection was broken?

who is responsible? Pinteime HW? Infinitime? Sony XA2 HW? SFOS BLE stack? amazfish?

Avamander commented 3 years ago

who is responsible? Pinteime HW? Infinitime? Sony XA2 HW? SFOS BLE stack? amazfish?

I'd say yes.

Next step is trying to analyze the RADIO IRQ and see if it stops at some point...

@JF002 Have you considered implementing secure BLE bonding? It's very likely that different rules apply to secure long-term connections and reconnection.

JF002 commented 3 years ago

who is responsible? Pinteime HW? Infinitime? Sony XA2 HW? SFOS BLE stack? amazfish?

I'd say yes.

:D I have no better answer for you, unfortunately... One of these, some, all? I don't know :/

@vbelloir On my Android phone (Huwaei P10, I think), connection is quite stable with Gadgetbridge, my "only" issue is that all BLE connectivity stops after ~24h. Or maybe I also have this issue but don't notice it because gadgetbridge does not send notification on connection... The issue you're describing is a bit different : it disconnects/reconnects every now and then That's probably another BLE related issue someone will have to investigate. Maybe try to build InfiniTime with the monitor mode enable (see this article and see what the HCI logs show (you need a JLink debugger for the RTT).

@JF002 Have you considered implementing secure BLE bonding? It's very likely that different rules apply to secure long-term connections and reconnection.

Probably in the future, yes, but I would have to learn how bonding and security work on BLE first ;) But I thought it would be a better idea to try to fix current issue before adding more complexity. Maybe I'm wrong and these features could improve the reliability? I don't know :/


For science, I've just integrated nimble 1.4 master branch into InfiniTime, I'll run it until tomorrow to see what happens. If anyone wants to try it, here's a DFU. NOTE Flash this file only on a devkit and if you have a SWD debugger on hands, it's absolutely not tested! NOTE The BLE address is hard-coded to 00:00:00:00:00:08 because the address detection on the RADIO failed for some reason. pinetime-mcuboot-app-dfu-1.3.99.zip

In the meantime, I've just sent a mail on the mynewt mailing list to see if anyone can help on this topic : http://mail-archives.apache.org/mod_mbox/mynewt-dev/202107.mbox/browser

Avamander commented 3 years ago

@JF002

Probably in the future, yes, but I would have to learn how bonding and security work on BLE first ;)

There's a few links here https://wiki.pine64.org/wiki/PineTime_Bluetooth but maybe you'll find some more cool documentation about it.

JF002 commented 3 years ago

Nimble 1.4 does not fix my issue : ble is not working after 19h :/

JF002 commented 3 years ago

New test last night : enable the logging feature of NimBLE : there was no error at all. It just logged that advertising was started and then stopped after 3 minutes, even if no data was transmitted in the air...

Now probing 3 radio signals as mentioned here. To be continued tomorrow :)

Crsarmv7l commented 3 years ago

Really difficult to investigate here, because there are lots of configurations and parameter available. For example, I'm using Infintime 1.3.0, with a Sony XA2 phone, running SFOS 4.1, and amazfish app as companion app. My phone is in my pocket, connected to the watch, and realy often (every half hour?), watch receives a notification that amazfish is now connected, which could mean that connection was broken?

who is responsible? Pinteime HW? Infinitime? Sony XA2 HW? SFOS BLE stack? amazfish?

Late response, but I also have the exact connect/disconnect issue you have with Amazfish, Infinitime 1.3 and Sailfish 3.4 on an fxtec pro1. I have tried using mcetool to utilize early suspend (not late which shuts off a bunch of stuff for battery saving) and while it would connect more, I still had frequent drops.

Seeing as JF002 and others have less frequent disconnects, but still have them, it appears the BLE issue is either exacerbated on Amazfish, or we also have an issue with Amazfish/SFOS that is coming into play. I would imagine Adam Piggz wouldn't be interested in investigating a potential Amazfish issue until Infinitime's BLE is fully stable.

JF002 commented 3 years ago

InfiniTime 1.5 and 1.6 brought many improvements the BLE connectivity. While I doubt those changes fix all connectivity issues, they certainly fix most of them. I'll close this issue, but feel free to open a new one if you encounter another BLE issue on InfiniTime 1.6 (and if it doesn't exist yet).

Thanks again to everyone who help debugging those issues!

tterrag1098 commented 3 years ago

That's great news! Thanks so much to all the team and especially @danielgjackson for all the long hours debugging this ridiculously tricky bug. Sometimes these obscure issues end up being obvious once you find them, but not the case here, I can see why this was missed for so long! I'll definitely be wearing my watch more regularly now that I don't have to fiddle with the bluetooth every morning :smile: