adafruit / Adafruit_BluefruitLE_nRF51

Arduino library for nRF51822-based Adafruit Bluefruit LE modules
199 stars 119 forks source link

ble.Reset causes NVM corruption #35

Open microbuilder opened 7 years ago

microbuilder commented 7 years ago

Calling ble.Reset() can sometimes cause hte NVM section to become corrupted, which seems to be a timing issue combined with the bootloader checking NVM at startup.

For steps to reproduce, see: https://forums.adafruit.com/viewtopic.php?f=53&t=116188&p=585162

steinerlein commented 7 years ago

Just for my understanding - when I set Services and Characteristics they are stored in NVM, right? I have observed that they get lost from time to time (if I power cycle often) but wasn't able to reproduce it reliably. Could this bug be the reason for that?

microbuilder commented 7 years ago

They are stored in NVM, yes, and I believe this issue is related yes. There seems to be a timing or other issue in the bootloader at startup where we have some checks for NVM memory to see if there was a firmware update (in which case the NVM will be cleared if the data structure changed), etc. Something is apparently going into the weeds on some specific edge case, but we're still trying to track down exactly what and why.

jmcintyre commented 7 years ago

Does this issue only effect the M0 or also the 32u4 and nRF52 versions of the feather?

microbuilder commented 7 years ago

It will affect both the M0 and 32u4 since it is inside the Bluefruit firmware. The nRF52 is a totally different project with a completely different programming model, and isn't affected by this issue.

eratical commented 7 years ago

We use the modul as a hid device to send a nfc serial via bluetooth to a connected device. It first runs as aspected, but after several power cycles of the bluefruit module, it stops sending data. The connected device still says it is connected. On Android and iOS the software keyboard normally disappears while a hid device is connected, after the error happens the software keyboards stays on screen even while it's connected to the bluefruit module.

When the error occurs, I can disconnect the bluefruit device in the android/ios setting permanently and recouple again. It will then function to the very next power cycle.

Module Info: BLEFRIEND32 nRF51822 QFACA10 B30C7F1FEEAA85D4 0.7.7 0.7.7 Dec 13 2016 S110 8.0.0, 0.2

ScottMit commented 7 years ago

I think I am seeing this error. Can someone confirm if this is still an issue? I have a classroom of Feather Bluefruit 32u4 devices that we are using for HID. Every time we upload new code to the Feather we have to forget and reconnect to the device. Doesn't matter whether this is our code or the examples from the Adafruit library. We have FACTORYRESET_ENABLE set to 0 so I don't think that is the problem.

steinerlein commented 7 years ago

Dear support, can you give us an estimate as to when you expect this issue to be adressed? This is a bug that turns the awesome hardware useless in my case.. Thanks!

microbuilder commented 7 years ago

There is no ETA we can give a solid promise of respecting, I'm afraid (as a company policy, not just in this case). We're aware of the issue, though, and it's high up on the ToDo list, and the main firmware author will be back from vacation next week to try to address this. We'll post here when there is an update.

steinerlein commented 7 years ago

Okay, thanks for the heads up!

hathach commented 7 years ago

sorry for the long wait, just get back home. I will try to resolve it asap :)

jmcintyre commented 7 years ago

Is there any update to the status of this issue? Apologies for pushing on this, but I will have to make some decisions in the next couple days based on this issue, so if it is going to be a while until a fix is developed, I might as well make those choices now. Thanks.

hathach commented 7 years ago

Sorry, I am still working on this :( . I hardly reproduce the issue, but I will try to push out some beta firmware for you to test with :D

jmcintyre commented 7 years ago

I'd be happy to send you hardware that does it every time. I don't think github has private messaging, but your email support team (Nick) has my emails from May under the title "Possible failing boards." Feel free to email me directly.

hathach commented 7 years ago

thanks, it is quite troublesome since I am in South East Asia :D

steinerlein commented 7 years ago

@hathach Have you been able to make any progress? I'm sure there are a few of us willing to test any beta firmware you might have available!

hathach commented 7 years ago

Hi @steinerlein

Sorry for the huge delay, I just pushed an 0.8.0 as beta release. Can you try to see if that works for you. If possible, please post your serial output as well.

steinerlein commented 7 years ago

@hathach thank you so much, I will try it in a couple of hours. Do you propose a testing procedure?

hathach commented 7 years ago

to be honest, I couldn't reproduce it except for maybe only one occasion. I have tried with Micro, 32u4 and M0 bluefruit. But we did enhance mutex with nvm and add some safeguard, hopefully this could work as expected.

steinerlein commented 7 years ago

@hathach While trying to reliably reproduce the NVM issue (which I wasn't able to do) I did completely brick the bluefruit module. I had this happen before, but wasn't able to document it. On previous occasions I was able to bring it back to life by going into DFU mode and reflashing the firmware, but it won't even go intoo DFU mode right now. Not sure what to make of it, I will update as it evolves..

The beta did work fine, while it was running. I reverted to 0.7.7 to try and provoque a NVM corruption, I guess that worked?

hathach commented 7 years ago

@steinerlein that is even scarier :( . Could you please try to FRST first, then try to put it into DFU mode with DFU pin. It is best to flash the M0 with an blinky sketch so M0 won't try to communicate with nrf51 chip. I hope we don't fix a issue by introducing a more serious one :D . But it shouldn't be though :D

steinerlein commented 7 years ago

@hathach two questions: What is FRST? Do you mean blinky sketch during the DFU update?

I don't think you introduced a new issue, I bricked it with 0.7.7

hathach commented 7 years ago

ah, sorry. FRST is factory reset, it is labeled simply Reset under the bottom of the board (next to SWDIO, SWCLK) which will reset corrupted data.

steinerlein commented 7 years ago

@hathach I got it back, and running on 0.7.7 I cannot get it to corrupt any saved characteristics. I will continue tomorrow and see if it holds everything over night and also test with 0.8

hathach commented 7 years ago

Great, it sounds promising already :)

steinerlein commented 7 years ago

I have tried provoking the NVM corruption that is the topic in this issue with both firmwares, 0.7.7 and 0.8 I was not able to reliably produce any errors, neither by power cycling or calling ATZ repeatedly. For this reason I can neither confirm or deny that version 0.8 resolves any issues, but so far it hasn't introduced any new ones. I'd be interested to hear about testing results from anyone else?

hathach commented 7 years ago

that is weird :( . We will keep 0.8.0 beta for awhile. If you don't mind please use 0.8.0 to see if you see any issues with NVM. If it could last for quite sometime in the real world usage, then it would be safe to release it.

hathach commented 7 years ago

@jmcintyre could you please try with 0.8.0 to see if this fixes your issue

jmcintyre commented 7 years ago

I will try to get to it next week. We've had to switch over to the NRF52, which has taken up some time, and I'm about to head out for vacation.

hathach commented 7 years ago

Thanks for reply, I would be great if you could confirm the fix. nrf52 is a great platform as well. Since you can do central role. If you need any help for transitioning, just post another question on our support forum.

jmcintyre commented 7 years ago

Sorry for the delay. I installed and ran through my example code from here: https://forums.adafruit.com/viewtopic.php?f=53&t=116188&p=583011#p582221

The issue still exists, but it acts differently. Instead of responding with a bunch of bad values after the ATZ call, it now returns empty:

Adafruit Bluefruit AT Command Example
-------------------------------------
Initialising the Bluefruit LE module: OK!
Performing a factory reset: 
AT+FACTORYRESET

<- OK
ATE=0

<- OK
Requesting Bluefruit info:
----------------
BLESPIFRIEND
nRF51822 QFACA10
DF06F0DA565E57B1
0.8.0
0.8.0
Jun 22 2017
S110 8.0.0, 0.2
----------------
AT > AT+GATTADDSERVICE=UUID128=24-91-4D-0B-2E-F9-4D-5B-9F-29-E9-1C-9B-7B-13-3B

<- 1
OK
AT > AT+GATTADDCHAR=UUID=0x0002, PROPERTIES=0x10, MIN_LEN=2, MAX_LEN=2, VALUE=0, DESCRIPTION=cached_tracker_x

<- 1
OK
AT > AT+GATTLIST

<- ID=01,UUID=0x4D0B,UUID128=24-91-4D-0B-2E-F9-4D-5B-9F-29-E9-1C-9B-7B-13-3B
  ID=01,UUID=0x0002,PROPERTIES=0x10,MIN_LEN=2,MAX_LEN=2,DATATYPE=0,DESCRIPTION=cached_tracker_x,VALUE=0
OK
AT > ATZ

<- OK
AT > AT+GATTLIST

<- OK
hathach commented 7 years ago

that is too bad, we enforce the crc checksum, look like the contents is corrupted somehow and is reset. Thank you very much for testing. We will look further at this issue

jmcintyre commented 7 years ago

I should probably note that this was on an M0. I don't have a 32u4 to test with.