NordicSemiconductor / IOS-DFU-Library

OTA DFU Library for Mac and iOS, compatible with nRF5x SoCs
http://www.nordicsemi.com
BSD 3-Clause "New" or "Revised" License
525 stars 214 forks source link

DFU frequently initiated on wrong/multiple device(s) #532

Open jmaha opened 8 months ago

jmaha commented 8 months ago

DFU Bootloader version (please complete the following information):

Device information (please complete the following information):

Describe the bug For several years, across multiple iOS versions and versions of this library, we've had an issue where the IOS-DFU-Library initiates DFU on the wrong device, and sometimes multiple devices. Note that we see the same behavior with the nRF Connect app on iOS/iPadOS devices. When we initiate DFU on one device, other nearby devices sometimes also reboot into their bootloaders in DFU mode, rather than just the target device.

This is causing problems with our products in production, as our customers use multiple products using nRF52833 devices simultaneously. We recently had a report where a customer tried to initiate a firmware update on one product, and another nearby product that was in use (different product, but running the same bootloader and soft device version) rebooted into its bootloader and interrupted operation.

Our code starting DFU is as follows, being called on only a single CBPeripheral.

    private func runDfu(peripheral: CBPeripheral,
                        device: Device,
                        advertisingName: String,
                        firmware: DFUFirmware) -> DFUServiceController?  {

        let initiator = DFUServiceInitiator().with(firmware: firmware)

        initiator.delegate = device
        initiator.progressDelegate = device

        // Uncomment this to receive feedback from Nordic DFU library
        initiator.logger = device

        // Set the DFU bootloader advertising name
        initiator.alternativeAdvertisingName = advertisingName

        return initiator.start(target: peripheral)
    }

Note that this problem is described in another ticket here, but the offered resolution doesn't apply -- the DFUSelectorDelegate is not involved in the initial selection of the device to update, but rather is used to attempt to identify the device after it's rebooted into its bootloader.

The initial request to start the DFU process appears to be getting sent to the wrong device/devices intermittently (e.g. sometimes the intended target device will correctly reboot and its DFU starts, but we'll also observe other nearby devices reboot simultaneously, though the DFU process is not performed on those devices and they time out back to their application after a while).

We have tested this extensively to understand the issue, and have narrowed it down to a problem with the Nordic DFU library. We've been able to consistently observe this happening, but not reproduce it in a controlled manner.

Logs We don't have any relevant logs for this issue.

philips77 commented 8 months ago

Hi, It's hard for me to believe that it could happen. You'd be the first person reporting it. If it really an issue, it looks like iOS bug/implementation detail.

As you know, iOS does not expose MAC addresses to the app. Instead, they are mapped to an UUID. I always assumed, that inside iOS there's a table, where a UUIDs are translated to MAC and the other way around.

In order not to touch CBPeripheral's or CBCentralManager's delegates, this library creates its own CBCentralManager and retrieves a peripheral by UUID. This should return the same device reference which was passed to peripheral.start(target:), and usually does.

If the issue you describe is true, that means that this mapping is based on some other mechanizm and it's possible to get a different device. Perhaps the hash of both MAC addresses is the same, or something like that. But that could fail in so many other cases as well, e.g. state restoration, etc.

I would suggest 2 things:

  1. Make sure, that the device you're passing to DFU service is in fact 100% correct. You don't have to disconnect if you're already connected. This seems to be the first thing I would look into. In some apps it's easy to select a wrong device due to a list flickering, etc. Also, this point is easy to check.
  2. You may try using this library in version 4.5.0 and using deprecated methods. This is the last version which was actually using the CBPeripheral instance given in the initiator: https://github.com/NordicSemiconductor/IOS-DFU-Library/blob/394c41f146edc726a8ffccacbab33ddda06cf17b/iOSDFULibrary/Classes/Implementation/DFUServiceInitiator.swift#L243-L247 Remember to start DFU using .start(), not .start(targetWithIdentifier:). Mind, that the old version does not allow to set delegate queues and takes control over manager and peripheral delegates, so you need to restore them on success or error. You may also try to modify the latest version to restore the old functionality.
jmaha commented 8 months ago

Hi, the ticket I linked from someone else does describe this same situation (DFU starting on the wrong device). The offered answer jumps to the MAC-vs-UUID issue, but glosses over the fact that the wrong device never should have rebooted in the first place.

I've seen this happen personally dozens of times over the last couple years. As I mentioned, it happens with our app, and it also happens with your nRF Connect app (multiple nearby devices reboot into their bootloaders when starting DFU on a device). We've dug into it extensively and are 100% sure the device we're passing to the DFU service is correct.

Our analysis indicated it might be related to the internal retry mechanism inside Nordic's DFU library. Our engineer spent several days testing, and he speculated that potentially the retry was somehow triggering on the wrong device.

philips77 commented 8 months ago

OK, this is intersting. I'll check the retry mechanism.

philips77 commented 8 months ago

I have few questions:

  1. Are all your devices based on nRF SDK 17.1 or at least use Secure DFU?
  2. As I understand, you are saying that the library connects to a wrong device even for the initial connection (before sending "jump to bootloader" command, right? Does that mean, that your app isn't connected to the DFU target before starting DFU? It just scans and the user selects the device, right? Or despite being connected, when you initiate DFU it still selects a different device?
  3. What do they advertise when not connected in normal mode? Are the MAC addresses random?
  4. Are you using a custom DfuPeripheralSelector?

My way of thinking is the following.

  1. User selects the correct device in your app,
  2. DFU process is started,
  3. Connection fails for some reason,
  4. Library tries to reconnect and connects to a different device.

The problem is that I can't see any retry mechanizm that could work like this. It either reconnects to the same CBPeripheral instance, or scans using DfuPeripheralSelector.

jmaha commented 7 months ago
  1. Are all your devices based on nRF SDK 17.1 or at least use Secure DFU?

All three of our products are running nRF SDK 17.1.0 and using Secure DFU. Each of the three products has its own firmware key to ensure it accepts the correct firmware.

  1. As I understand, you are saying that the library connects to a wrong device even for the initial connection (before sending "jump to bootloader" command, right? Does that mean, that your app isn't connected to the DFU target before starting DFU? It just scans and the user selects the device, right? Or despite being connected, when you initiate DFU it still selects a different device?

Our app actually connects to all our nearby devices prior to starting DFU--the user cannot initiate DFU for a device until our app has established a connection to it. However, when I’ve seen this happen with the nRF Connect app, I am explicitly connected only to the intended device, and yet I’ll still see nearby devices reboot into their bootloaders when starting DFU on the target device.

  1. What do they advertise when not connected in normal mode? Are the MAC addresses random?

In normal mode, we're sending out a connectable advertisement that includes the Nordic DFU service using its standard UUID, and custom manufacturing data.

All MAC addresses are unique and derived in firmware from the products' unique serial numbers (stored in the nRF's UICR). As we also support DFU from Android, the bootloader uses a MAC address that's one greater than the application's MAC address per Nordic's Android DFU library example. Based on how our MAC addresses are derived, there won't be collisions between bootloader and application MAC addresses, even if their serial numbers are sequential.

  1. Are you using a custom DfuPeripheralSelector?

No, we are using the default selector.

jmaha commented 4 months ago

Just checking in -- do you need any additional details on this issue? Another data point if it's helpful - I observed this just yesterday using the latest version of the nRF Connect app. Here was my situation:

Here's what I observed:

  1. I opened nRF Connect and connected to Device 2. I verified it was the correct device by retrieving its serial number as I did not want to disturb Device 1's test. Initiated DFU of new firmware version.
  2. Based on the nRF PPK2 current plot, I realized that Device 1 had rebooted, and that nRF Connect was updating the firmware on Device 1 instead of Device 2.
  3. nRF Connect reported DFU was complete.
  4. I verified that Device 1 was operating again, now with the new firmware, and Device 2 was still in the bootloader awaiting update, advertising itself as "Dfu66933".
  5. In nRF Connect I connected to the Dfu66933 device and initiated DFU. The update completed successfully and Device 2 came back online.