espressif / esp-nimble

A fork of NimBLE stack, for use with ESP32 and ESP-IDF
Apache License 2.0
76 stars 50 forks source link

Enabling RPA causes bonding failure #8

Closed duncairn closed 4 years ago

duncairn commented 4 years ago

I am trying to bond en ESP32-based project with an iPhone SE (but this affects any iOS device). Since there is no user input available I specify Just Works, turn on MiTM, etc. It works fine initially - the phone will pair and bond, then on next connect it won't prompt for pairing. I can restart the ESP32 and it is still all fine. However, once the phone decides to change its public address, any connection results in nothing being transferred. The phone must be told to forget about the ESP and go through the pairing/bonding process again.

I gather that Resolvable Private Addresses are the way around this, and that ESP Nimble 1.2 apparently has this functionality, but when I enable it with:

ble_hs_pvcy_rpa_config(1);

pairing/bonding doesn't take place and, typically, if a characteristic is accessed the connection drops. Since the RPA enabling function wasn't too obvious, I am wondering if this is indeed meant to work or if it's not yet ready.

I've attached two logs which show an overview of where it goes wrong. The 'NimBLE' entries I added to trace what storage calls were being made, but left them in since they provide useful anchors. Cutting to the chase, the log where it goes wrong says: handle=0 our_ota_addr_type=1 our_ota_addr=57:48:b1:82:9f:a8 our_id_addr_type=0 our_id_addr=4c:11:ae:c9:53:de peer_ota_addr_type=1 peer_ota_addr=4f:9d:c9:74:79:0b peer_id_addr_type=1 peer_id_addr=4f:9d:c9:74:79:0b conn_itvl=24 conn_latency=0 supervision_timeout=72 encrypted=0 authenticated=0 bonded=0

where the good (non-RPA) log says: handle=0 our_ota_addr_type=0 our_ota_addr=4c:11:ae:c9:53:de our_id_addr_type=0 our_id_addr=4c:11:ae:c9:53:de peer_ota_addr_type=1 peer_ota_addr=4f:9d:c9:74:79:0b peer_id_addr_type=1 peer_id_addr=4f:9d:c9:74:79:0b conn_itvl=24 conn_latency=0 supervision_timeout=72 encrypted=0 authenticated=0 bonded=0

The ESP Nimble version is 1.2 but I patched it with the NimBLE_CCCD_overflow_loop_fix.

RPA_disabled.txt RPA_enabled.txt

(Forgot to say this is using the example bleprph application. The only mod I made to that was to add the RPA enable line, and a command to report the store peer count.)

h2zero commented 4 years ago

I just posted a PR that directly fixes this, checkout #7.

prasad-alatkar commented 4 years ago

Since the RPA enabling function wasn't too obvious

Hi @duncairn Thank you for raising the issue, I can understand that the documentation needs to be improved for this. However we have tried to summarize the functionality in README of bleprph example. Just to be on the same page, I believe you might have already followed below steps while enabling RPA:

  1. own_addr_type needs to be set to BLE_ADDR_RANDOM.
  2. ble_hs_pvcy_rpa_config() should be called in after BLE Host stack starts and NimBLE host and ESP controller are synced. I would recommend to call this function in bleprph_on_sync i.e. sync callback.

Can you please provide sdkconfig as well ? It would be really helpful to get DEBUG prints in the log (menuconfig --> Component config --> Log output --> Default Log verbosity to Debug). Please try make erase_flash before running the app once.

duncairn commented 4 years ago

Thanks for getting back to me quickly on this, @prasad-alatkar , and I apologize for not digesting the readme properly - I had mistakenly looked at the readme from the previous version, hence why I missed the instructions.

I made the changes you suggested in bleprph_on_sync():

      own_addr_type = BLE_ADDR_RANDOM;

    ble_hs_pvcy_rpa_config(1);

    /* Figure out address to use while advertising (no privacy for now) */
//    rc = ble_hs_id_infer_auto(0, &own_addr_type);
//    if (rc != 0) {
//        MODLOG_DFLT(ERROR, "error determining address type; rc=%d\n", rc);
//        return;
//    }

Unfortunately, that was a small improvement. The phone would connect and pair OK first time now, but on disconnect and reconnect the ESP would show up as a different device, leaving the phone trying to connect to a device that was no longer there. The log for this is:

reconnect_new_address.txt

I left the previous additional log entries in place for reference, and at suitable pauses (such as when waiting for the phone to accept the user's pair command) grabbed a store peer count as a marker.

I then made the mods that @h2zero had made in checkout #7 (thanks - well timed!). Now I could disconnect and reconnect, and still access characteristics. Great! The log for this:

reconnect_h2zero_mods.txt

I then rebooted the phone to force an address change, and on trying to reconnect it failed with no access to characteristics (here is the continued log from the previous - the ESP wasn't restarted during this):

phone_reboot.txt

Summarising, the own_addr_type fixed the first connect problem, but would cause the ESP to appear as a different device on disconnect/reconnect, @h2zero 's mods fixed that and got me back to where I was originally without RPA but now with RPA. So it's an improvement but still not working.

For completeness, as requested: sdkconfig.txt

h2zero commented 4 years ago

The on_sync() function I used while testing was this:

static void
bleprph_on_sync(void)
{
    int rc;
    ble_hs_pvcy_rpa_config(true);
    rc = ble_hs_util_ensure_addr(1);
    assert(rc == 0);

    /* Figure out address to use while advertising (no privacy for now) */
    rc = ble_hs_id_infer_auto(1, &own_addr_type);
    if (rc != 0) {
        MODLOG_DFLT(ERROR, "error determining address type; rc=%d\n", rc);
        return;
    }

    /* Printing ADDR */
    uint8_t addr_val[6] = {0};
    rc = ble_hs_id_copy_addr(own_addr_type, addr_val, NULL);

    MODLOG_DFLT(INFO, "Device Address: ");
    print_addr(addr_val);
    MODLOG_DFLT(INFO, "\n");
    /* Begin advertising. */
    bleprph_advertise();
}

Comapre it to yours and give it a try, I had no issues this way.

prasad-alatkar commented 4 years ago

@duncairn After taking a look at your logs, I was also a bit confused as why it works for me and not for you. Nevertheless I think I have found the issue, and again I need to be blamed for the poor documentation for this. The issue you are facing is the phone device is not able to resolve the RPA of ESP32. Now if everything else is good, it can only happen when the phone device does not have IRK of our device. This IRK is distributed during key exchange (after pairing). To distribute IRK we have to mention in our host configuration (ble_hs_cfg) that IRK (BLE_SM_PAIR_KEY_DIST_ID) needs to be distributed. So I am attaching patch based on bleprph to try out the RPA feature, please let me know if it resolves your issue. Again, sorry for missing out on mentioning it in my earlier comment. NimBLE_RPA_Example_use_bleprph.txt

@h2zero Thank you for taking time out and contributing to esp-nimble, I had taken a very quick look at your PR. I will update there my observations, I just think the check if (ble_host_rpa_enabled()) should not be removed, as it can affect Non-RPA cases. Again I will take a closer look by today EOD.

duncairn commented 4 years ago

Thanks for trying again, @h2zero .

The on_sync() function I used while testing was this:

Unfortunately, this didn't work at all since it would fail when trying to advertise (rc = 530). I tracked it down to this line:

`rc = ble_hs_id_infer_auto(1, &own_addr_type);

The '1' was the cause. Changing back to 0 gets it advertising again, but of course then we're back to the start.

@prasad-alatkar - I applied your patch (not your problem that I failed to read the instructions at the start!) but it seemed to make no difference. That is, disconnect/reconnect work fine, but phone reboot and change of address fails. However, I am doing this using the patch that @h2zero provided in #7, so will revert to the pre-patch version and see if it works in that.

Do you need logs for this?

prasad-alatkar commented 4 years ago

Hi @duncairn by change of address, Do you mean 'The phone (scanning app) observes same address of ESP32 (paired previously) ' ? If that is the case then we are good. I mean, you can observe the new address from another phone device (scanning app). Apart from that you can configure RPA timeout (time after which new RPA address is assigned) from menuconfig, by default it is set to 900 seconds.

duncairn commented 4 years ago

Ah, I'm afraid I might be using the wrong terms and confusing things. By 'change of address' I mean the phone's change so the ESP doesn't recognize it as an already-paired central. iOS devices change their public address periodically (not sure how often but I read somewhere it could be every 15 minutes), so to force that to happen when I need it to I just reboot the phone.

I am fairly sure the phone still recognizes the ESP since it doesn't ask to pair with it, so whether or not the ESP is changing public address, that part of it isn't a problem for me. The problem is purely that the ESP doesn't recognize the phone after the change of phone public address.

Regarding my previous comment, I reverted @h2zero 's #7 patch and retained your most recent patch (of the bleprph example). This time there was no problem with the phone disconnecting and reconnecting, so the issues that enabling RPA provoked (multiple ESP devices appearing, etc) would seem to be down to my changes to main.c. However, the problem of the ESP not recognizing the phone after the phone has been rebooted (i.e. the phone has changed public address, and the reason for enabling RPA) is still there.

Hope I haven't confused things further!

h2zero commented 4 years ago

So there's a couple things going on here it seems, could you post your on_sync() and app_main() from your file, or the whole file? There are a few things that need to be set correctly for random addresses to work correctly with bonding etc.

Also, just a heads up, you can just turn bluetooth off and on on your phone to change the address, no need to reboot.

duncairn commented 4 years ago

Thanks for the tip, @h2zero - I'm an Android fella so iOS subtleties escape me :)

main.c

h2zero commented 4 years ago

For your reference when looking at your logs, this is how mine appear when correct (I do have my patch applied here though). This is with RPA enabled on the esp32. Only difference with it disabled is the our_ota_addr_type and our_ota_addr are the same as the our_id_type/addr.

Initial connection:

LE connection complete. handle=0 role=1 paddrtype=1 addr=61.1c.b8.fd.c4.b5 local_rpa=0.0.0.0.0.0 peer_rpa=0.0.0.0.0.0 itvl=24 latency=0 spvn_tmo=72 mca=1
0xb5 0xc4 0xfd 0xb8 0x1c 0x61 connection established; status=0 Revised our id addr:
0x8f 0x6e 0x93 0x0a 0xdc 0x76
handle=0 our_ota_addr_type=1 our_ota_addr=76:dc:0a:93:6e:8f our_id_addr_type=0 our_id_addr=30:ae:a4:cc:24:26 peer_ota_addr_type=1 peer_ota_addr=61:1c:b8:fd:c4:b5 peer_id_addr_type=1 peer_id_addr=61:1c:b8:fd:c4:b5 conn_itvl=24 conn_latency=0 supervision_timeout=72 encrypted=0 authenticated=0 bonded=0

Encrypted/bonded:

LE LTK Req. handle=0 rand=00 encdiv=0
ble_hs_hci_cmd_send: ogf=0x08 ocf=0x001a len=18
0x1a 0x20 0x12 0x00 0x00 0xda 0xaf 0x8d 0x30 0xb9 0xf7 0x2a 0x57 0xba 0xee 0xcd 0x73 0x82 0x99 0xac 0xe4
Command complete: cmd_pkts=5 ogf=0x8 ocf=0x1a status=0 handle=0
Encrypt change: status=0 handle=0 state=1
host tx hci data; handle=0 length=21
ble_hs_hci_acl_tx(): 0x00 0x00 0x15 0x00 0x11 0x00 0x06 0x00 0x08 0xef 0x8d 0xe2 0x16 0x4f 0xec 0x43 0x0d 0xbf 0x5b 0xdd 0x34 0xc0 0x53 0x1e 0xb8
Revised our id addr:
0x8f 0x6e 0x93 0x0a 0xdc 0x76
host tx hci data; handle=0 length=12
ble_hs_hci_acl_tx(): 0x00 0x00 0x0c 0x00 0x08 0x00 0x06 0x00 0x09 0x00 0x26 0x24 0xcc 0xa4 0xae 0x30
Number of Completed Packets: num_handles=1
handle:0 pkts:1
Number of Completed Packets: num_handles=1
handle:0 pkts:1
ble_hs_hci_evt_acl_process(): conn_handle=0 pb=2 len=21 data=0x11 0x00 0x06 0x00 0x08 0x9e 0xb4 0x85 0x05 0x6a 0x41 0xd3 0x62 0xde 0x00 0xf9 0xfe 0x35 0xee 0xbc 0xf5
rxed sm command: id info; conn=0 irk=0x9e 0xb4 0x85 0x05 0x6a 0x41 0xd3 0x62 0xde 0x00 0xf9 0xfe 0x35 0xee 0xbc 0xf5
rx_key_flags=0x08
ble_hs_hci_evt_acl_process(): conn_handle=0 pb=2 len=12 data=0x08 0x00 0x06 0x00 0x09 0x00 0x39 0x15 0x7b 0x45 0x5f 0xe0
rxed sm command: id addr info; conn=0 addr_type=0 addr=e0:5f:45:7b:15:39
rx_key_flags=0x00
encryption change event; status=0 Revised our id addr:
0x8f 0x6e 0x93 0x0a 0xdc 0x76
handle=0 our_ota_addr_type=1 our_ota_addr=76:dc:0a:93:6e:8f our_id_addr_type=0 our_id_addr=30:ae:a4:cc:24:26 peer_ota_addr_type=1 peer_ota_addr=61:1c:b8:fd:c4:b5 peer_id_addr_type=1 peer_id_addr=61:1c:b8:fd:c4:b5 conn_itvl=24 conn_latency=0 supervision_timeout=72 encrypted=1 authenticated=1 bonded=1

Disconnect:

disconnect; reason=531 handle=0 our_ota_addr_type=1 our_ota_addr=76:dc:0a:93:6e:8f our_id_addr_type=0 our_id_addr=30:ae:a4:cc:24:26 peer_ota_addr_type=1 peer_ota_addr=61:1c:b8:fd:c4:b5 peer_id_addr_type=0 peer_id_addr=e0:5f:45:7b:15:39 conn_itvl=24 conn_latency=0 supervision_timeout=72 encrypted=1 authenticated=1 bonded=1

Notice the change of peer_id_addr and peer_id_type.

Reconnect:

LE connection complete. handle=0 role=1 paddrtype=1 addr=4d.6c.15.67.f2.c8 local_rpa=0.0.0.0.0.0 peer_rpa=0.0.0.0.0.0 itvl=24 latency=0 spvn_tmo=72 mca=1
0xc8 0xf2 0x67 0x15 0x6c 0x4d connection established; status=0 Revised our id addr:
0x8f 0x6e 0x93 0x0a 0xdc 0x76
handle=0 our_ota_addr_type=1 our_ota_addr=76:dc:0a:93:6e:8f our_id_addr_type=0 our_id_addr=30:ae:a4:cc:24:26 peer_ota_addr_type=1 peer_ota_addr=4d:6c:15:67:f2:c8 peer_id_addr_type=0 peer_id_addr=e0:5f:45:7b:15:39 conn_itvl=24 conn_latency=0 supervision_timeout=72 encrypted=0 authenticated=0 bonded=0

ble_hs_hci_cmd_send: ogf=0x08 ocf=0x0016 len=2
0x16 0x20 0x02 0x00 0x00
Command Status: status=0 cmd_pkts=5 ocf=0x16 ogf=0x8
LE Remote Used Features. handle=0 feat=dd 00 00 00 00 00 00 00
LE LTK Req. handle=0 rand=00 encdiv=0
Revised our id addr:
0x8f 0x6e 0x93 0x0a 0xdc 0x76
looking up our sec; peer_addr_type=0 peer_addr=0x39 0x15 0x7b 0x45 0x5f 0xe0  ediv=0x00 rand=0x0
ble_hs_hci_cmd_send: ogf=0x08 ocf=0x001a len=18
0x1a 0x20 0x12 0x00 0x00 0xda 0xaf 0x8d 0x30 0xb9 0xf7 0x2a 0x57 0xba 0xee 0xcd 0x73 0x82 0x99 0xac 0xe4
Command complete: cmd_pkts=5 ogf=0x8 ocf=0x1a status=0 handle=0
Encrypt change: status=0 handle=0 state=1
encryption change event; status=0 Revised our id addr:
0x8f 0x6e 0x93 0x0a 0xdc 0x76
handle=0 our_ota_addr_type=1 our_ota_addr=76:dc:0a:93:6e:8f our_id_addr_type=0 our_id_addr=30:ae:a4:cc:24:26 peer_ota_addr_type=1 peer_ota_addr=4d:6c:15:67:f2:c8 peer_id_addr_type=0 peer_id_addr=e0:5f:45:7b:15:39 conn_itvl=24 conn_latency=0 supervision_timeout=72 encrypted=1 authenticated=1 bonded=1

For completeness here is the code for app_main() and on_sync() I used:

static void
bleprph_on_sync(void)
{
    int rc;
ble_hs_pvcy_rpa_config(true);
    rc = ble_hs_util_ensure_addr(1);
    assert(rc == 0);

    /* Figure out address to use while advertising (no privacy for now) */
    rc = ble_hs_id_infer_auto(1, &own_addr_type);
    if (rc != 0) {
        MODLOG_DFLT(ERROR, "error determining address type; rc=%d\n", rc);
        return;
    }

    /* Printing ADDR */
    uint8_t addr_val[6] = {0};
    rc = ble_hs_id_copy_addr(own_addr_type, addr_val, NULL);

    MODLOG_DFLT(INFO, "Device Address: ");
    print_addr(addr_val);
    MODLOG_DFLT(INFO, "\n");
    /* Begin advertising. */
    bleprph_advertise();
}

void bleprph_host_task(void *param)
{
    ESP_LOGI(tag, "BLE Host Task Started");
    /* This function will return only when nimble_port_stop() is executed */
    nimble_port_run();

    nimble_port_freertos_deinit();
}

void
app_main(void)
{
    int rc;

    /* Initialize NVS — it is used to store PHY calibration data */
    esp_err_t ret = nvs_flash_init();
    if (ret == ESP_ERR_NVS_NO_FREE_PAGES || ret == ESP_ERR_NVS_NEW_VERSION_FOUND) {
        ESP_ERROR_CHECK(nvs_flash_erase());
        ret = nvs_flash_init();
    }
    ESP_ERROR_CHECK(ret);

    ESP_ERROR_CHECK(esp_nimble_hci_and_controller_init());

    nimble_port_init();
    /* Initialize the NimBLE host configuration. */
    ble_hs_cfg.reset_cb = bleprph_on_reset;
    ble_hs_cfg.sync_cb = bleprph_on_sync;
    ble_hs_cfg.gatts_register_cb = gatt_svr_register_cb;
    ble_hs_cfg.store_status_cb = ble_store_util_status_rr;

    ble_hs_cfg.sm_io_cap = BLE_HS_IO_DISPLAY_ONLY;
//#ifdef CONFIG_EXAMPLE_BONDING
    ble_hs_cfg.sm_bonding = 1;
//#endif
//#ifdef CONFIG_EXAMPLE_MITM
    ble_hs_cfg.sm_mitm = 1;
//#endif
//#ifdef CONFIG_EXAMPLE_USE_SC
    ble_hs_cfg.sm_sc = 1;
//#else
//    ble_hs_cfg.sm_sc = 0;
//#ifdef CONFIG_EXAMPLE_BONDING
    ble_hs_cfg.sm_our_key_dist = 3;
    ble_hs_cfg.sm_their_key_dist = 3;
//#endif
//#endif

    rc = gatt_svr_init();
    assert(rc == 0);

    /* Set the default device name. */
    rc = ble_svc_gap_device_name_set("nimble-bleprph");
    assert(rc == 0);

    /* XXX Need to have template for store */
    ble_store_config_init();

    nimble_port_freertos_init(bleprph_host_task);

 //   ble_hs_pvcy_rpa_config(true);
    /* Initialize command line interface to accept input from user */
    rc = scli_init();
    if (rc != ESP_OK) {
        ESP_LOGE(tag, "scli_init() failed");
    }
}

One last suggestion is to use make (or idf.py if on IDF 4+) erase_flash before flashing your device. The old bond data can mess with things I have found. Also delete the bonds from your phone before trying again.

Edit: correct log, fix bad edit 😄

h2zero commented 4 years ago

@duncairn I just tested your file with and without my patch and it works as expected. With RPA on and no patch reconnection and encryption work, with RPA off, when the phone address changes reconnection and encryption do not, must re-pair. With the patch all is well either way.

I suspect my note above about erase_flash and clearing the bonds from your phone will solve your issue.

duncairn commented 4 years ago

@h2zero - thanks for persevering with this. I am not seeing what you do, however, even after erasing the flash and clearing the phone pairing info (which I do before each series of tests anyway). Perhaps I should be more explicit and say that after the phone change of address, it will connect with the ESP and apparently resume, but it is not able to transfer data.

When it first bonds, and after a simple reconnect, I can access the characteristic that sends a random number back (that is, 5c3a659e-897e-45e1-b016-007107c96df6). When I try this after the address change, I can seem to access the characteristic but nothing is read. Can you confirm that when you try this you really receive a new random number?

h2zero commented 4 years ago

I will try again and see if I can repro what you're seeing, can you confirm for me that you are using the file as posted and are not patched?

Edit: To answer your question though, yes I can read a new value each time from the characteristic. I'm going to wait for the esp to change it's address and see what happens then. So far 3 address changes with my phone and no problems. What app are you using?

duncairn commented 4 years ago

I can't run the exact file you posted because I get error enabling advertisement; rc=530 if the first parameter to ble_hs_id_infer_auto() is 1. However, apart from that I get the same result whichever file version I use. I wonder if the advertisement is the problem here - if you don't get that error then I shouldn't.

The app I tend to use is BLE Scanner by bluepixel, but I get the same with BlueSee.

[Thinks...] It has to be the own_addr_type variable that is being set inappropriately since it works (that is, the advertisement goes OK) if I explicitly set it as per @prasad-alatkar 's suggestion to BLE_ADDR_RANDOM. I'll dig into this...

h2zero commented 4 years ago

I’ve been using the file you posted actually. Just so we’re on the same page. Try using nRF connect for your phone app. That’s what I’ve been testing with.

duncairn commented 4 years ago

nRF suggests you're using an Android phone? I use that on my Android phone but haven't tested RPA with it yet - the reason for needing RPA was because iOS would constantly pop up the pairing dialog whereas Android, with nRF, didn't. It seemed that nRF (or Android) would just go through the motions of re-bonding without saying anything if there was a problem, whereas iOS wouldn't. Since I was using Android I was oblivious to the issue until an iOS developer complained...

duncairn commented 4 years ago

Just tried with nRF on Android and there was no issue. In fact, I had to closely study the logs to check that the phone had in fact changed its address! On connect after a reboot (I did the BT disable thing but wanted to make sure) I could see that the bonding flag in nRF flicked off and back on, so if there was any problem it was covered up.

h2zero commented 4 years ago

@duncairn Just for a complete answer to your question, after 4 address changes on the esp32 and approx. 8 changes on the phone I am still able to connect and read that characteristic with no trouble.

Good news is I found the issue you have with ble_hs_id_infer_auto(1, &own_addr_type); not that it's a big issue but FYI ble_hs_pvcy_rpa_config(1); needs to be called before ble_hs_id_infer_auto(1, &own_addr_type);. Good for me to know as well. But you should be fine just using own_addr_type = BLE_OWN_ADDR_RANDOM;.

static void
bleprph_on_sync(void)
{
    int rc;
    rc = ble_hs_util_ensure_addr(1);
        assert(rc == 0);

    rc = ble_hs_pvcy_rpa_config(1);
    if (rc != 0) {
        ESP_LOGI(tag, "Error configuring privacy (RPA)");
    }

    /* Configure RPA */
    //own_addr_type = BLE_ADDR_RANDOM;
     /* Figure out address to use while advertising (no privacy for now) */
    rc = ble_hs_id_infer_auto(1, &own_addr_type);
    if (rc != 0) {
        MODLOG_DFLT(ERROR, "error determining address type; rc=%d\n", rc);
        return;
    } ....

I am using an iPhone SE as you are, nRF connect has an iOS app as well.

Edit: Just a thought but what IDF are you using? I'm testing on the master branch. Also this phone is on older firmware iOS Version 11.4.1. Since you're not an apple guy here's a tip as well; never update an iPhone after 2 years of age, unless you HAVE TO 😆.

duncairn commented 4 years ago

OK! Sorry for making assumptions, and thanks for pointing out nRF is on the playstore - wish I'd found that to start with ;) Strangely, I can't get it to pair with the ESP, but I might be mis-using it since the user interface is very different to the Android version.

I changed bleprph_on_synch() to exactly as your post there (a cut'n'paste), but still get the advertisement enabling error, so there is clearly something amiss. I think a re-port of nimble 1.2 into esp-idf might be in order too at this point.

h2zero commented 4 years ago

Lol no need to be sorry, most people here use android phones, natural assumption.

It should just popup a message asking to pair when you try to read the characteristic.

That error could likely be due to using an older commit of esp-nimble, I would suggest updating to current if needed and try again.

duncairn commented 4 years ago

Sorry for the delay in getting back to you on this - a lot of water has passed under the bridge since!

First, I was using esp-idf 4.1 but updated to include nimble 1.2. So I tried esp-idf 4.2 (dev) and that didn't show the advertising error when own_addr_type was 3. The nimble code was exactly the same, the example app also exactly the same, so the implication is that something outside of nimble would be affecting it. (You don't want to know about the trouble that enabling debug-level logging caused on 4.2!)

But... still there was the fail on address change. So I tried updating to esp-idf 4.2 (release) but couldn't because my setup wouldn't install the new requirement of gdbgui (it would fail on cffi). This is using msys and make (i.e. the legacy development environment), and previously I'd failed to get the cmake environment to install properly. Maybe that had something to do with things.

To make sure it's not my setup, I ran up a brand new fully-patched W7 in a VM and installed 4.2 (release) and the cmake stuff in there. That also failed(!) but eventually I worked through all the workarounds and had a pristine unadulterated 4.2 (release) compiling and programming the ESP.

Still, I get the fail on address change! I can't see how it can be working OK for you when it doesn't for me even with a totally clean system. There has to be something else involved which is screwing things. Maybe the phone? This is a iPhone SE running 13.3.1 and the only stuff I've added has been the bluetooth scanners.

h2zero commented 4 years ago

Wow, good effort on your part 👍. Still very strange that you are having such difficulties. I have been using the master branch but 4.2 (release) seems to be on-par with that as far as NimBLE is concerned.

Would you be able to post another log? One that shows a connect->bond->disconnect->change address->reconnect? I'm quite curious to see what you do.

duncairn commented 4 years ago

No problem! Attached.

This is using an Adafruit ESP32 Feather, just to make sure it's not something about our hardware setup that's causing this. I'll try the master branch later today - any straw is worth clutching right now :)

h2zero commented 4 years ago

Well I can see in your log you are not storing the IRK for either your device or the phone, that's where you need:

+#ifdef CONFIG_EXAMPLE_BONDING
+    ble_hs_cfg.sm_our_key_dist = BLE_SM_PAIR_KEY_DIST_ENC | BLE_SM_PAIR_KEY_DIST_ID;
+    /* Refer components/nimble/nimble/nimble/host/include/host/ble_sm.h for
+     * more information */
+    ble_hs_cfg.sm_their_key_dist = BLE_SM_PAIR_KEY_DIST_ENC | BLE_SM_PAIR_KEY_DIST_ID;
 #endif

from the patch file above that @prasad-alatkar posted.

Try that and post another log, there may still be local id address issues after but lets see before we jump in.

prasad-alatkar commented 4 years ago

@h2zero thank you for your time and efforts. Just want to touch upon few points here.

but FYI ble_hs_pvcy_rpa_config(1); needs to be called before ble_hs_id_infer_auto(1, &own_addr_type);.

Ideally, we should not call ble_hs_id_infer_auto when using Host based privacy (RPA), the first parameter(privacy) passed is for Controller based privacy (default for NimBLE). You can actually dig into this API to understand what own_addr_type is getting set to when first parameter is 1 :smiley: . If you see my shared patch, I have actually removed ble_hs_id_infer_auto. The reason being, as I have mentioned earlier, we have implemented Host based privacy for RPA. Now if it was controller based RPA then setting addr_type to BLE_OWN_ADDR_RPA_PUBLIC_DEFAULT would have made sense, but for our case we have to use BLE_OWN_ADDR_RANDOM. I understand this doesn't look too obvious but this is how we can achieve RPA/privacy without controller's role (or exclusively in host without controller knowing).

So to sum it all, just setting addr_type to BLE_OWN_ADDR_RANDOM before advertising/scanning and calling ble_hs_pvcy_rpa_config(1) should suffice.

However, the problem of the ESP not recognizing the phone after the phone has been rebooted (i.e. the phone has changed public address, and the reason for enabling RPA) is still there.

Can you please provide DEBUG log when you see this issue ? Now, if I understand clearly, if the phone has not changed its IRK (identity key to resolve RPA) then even if it changes the address (RPA) the RPA should be resolvable by our ESP device as it has the IRK which is sufficient to resolve identity of the peer device. I have verified the working behavior with Android phone.

duncairn commented 4 years ago

However, the problem of the ESP not recognizing the phone after the phone has been rebooted >(i.e. the phone has changed public address, and the reason for enabling RPA) is still there.

Can you please provide DEBUG log when you see this issue ?

This was resolved by your patch posted early on. I still seemed to have the issue becuase the patch was applied incorrectly, so part of it was inoperative. Entirely my fault, and I'm very sorry for making that mistake and dragging this on. I appreciate the patience exhibited by yourself and @h2zero .

That said, do you want logs now it works? I thought probably not, but if you want them for comparison I can run them off :)

prasad-alatkar commented 4 years ago

Hi @duncairn happy that the issue is resolved now.

making that mistake and dragging this on.

I am quite sure this thread might help few users while implementing RPA feature :smiley:

h2zero commented 4 years ago

Ideally, we should not call ble_hs_id_infer_auto when using Host based privacy (RPA), the first parameter(privacy) passed is for Controller based privacy (default for NimBLE). You can actually dig into this API to understand what own_addr_type is getting set to when first parameter is 1 😃 . If you see my shared patch, I have actually removed ble_hs_id_infer_auto. The reason being, as I have mentioned earlier, we have implemented Host based privacy for RPA. Now if it was controller based RPA then setting addr_type to BLE_OWN_ADDR_RPA_PUBLIC_DEFAULT would have made sense, but for our case we have to use BLE_OWN_ADDR_RANDOM. I understand this doesn't look too obvious but this is how we can achieve RPA/privacy without controller's role (or exclusively in host without controller knowing).

@prasad-alatkar you are quite right here, sorry for confusing this. I had that in my code while I was testing things and it was working so I shared it here in-case it helped.

@duncairn Glad to hear you got it working, definitely a good thread for others in the future.

prasad-alatkar commented 4 years ago

@duncairn Please help to close the issue if resolved.

duncairn commented 4 years ago

My apologies @prasad-alatkar - being new to the github way I didn't realise it would be up to me, but it is logical that it should. Done ;)

veneno-529 commented 1 year ago

For your reference when looking at your logs, this is how mine appear when correct (I do have my patch applied here though). This is with RPA enabled on the esp32. Only difference with it disabled is the our_ota_addr_type and our_ota_addr are the same as the our_id_type/addr.

Initial connection:

LE connection complete. handle=0 role=1 paddrtype=1 addr=61.1c.b8.fd.c4.b5 local_rpa=0.0.0.0.0.0 peer_rpa=0.0.0.0.0.0 itvl=24 latency=0 spvn_tmo=72 mca=1
0xb5 0xc4 0xfd 0xb8 0x1c 0x61 connection established; status=0 Revised our id addr:
0x8f 0x6e 0x93 0x0a 0xdc 0x76
handle=0 our_ota_addr_type=1 our_ota_addr=76:dc:0a:93:6e:8f our_id_addr_type=0 our_id_addr=30:ae:a4:cc:24:26 peer_ota_addr_type=1 peer_ota_addr=61:1c:b8:fd:c4:b5 peer_id_addr_type=1 peer_id_addr=61:1c:b8:fd:c4:b5 conn_itvl=24 conn_latency=0 supervision_timeout=72 encrypted=0 authenticated=0 bonded=0

Encrypted/bonded:

LE LTK Req. handle=0 rand=00 encdiv=0
ble_hs_hci_cmd_send: ogf=0x08 ocf=0x001a len=18
0x1a 0x20 0x12 0x00 0x00 0xda 0xaf 0x8d 0x30 0xb9 0xf7 0x2a 0x57 0xba 0xee 0xcd 0x73 0x82 0x99 0xac 0xe4
Command complete: cmd_pkts=5 ogf=0x8 ocf=0x1a status=0 handle=0
Encrypt change: status=0 handle=0 state=1
host tx hci data; handle=0 length=21
ble_hs_hci_acl_tx(): 0x00 0x00 0x15 0x00 0x11 0x00 0x06 0x00 0x08 0xef 0x8d 0xe2 0x16 0x4f 0xec 0x43 0x0d 0xbf 0x5b 0xdd 0x34 0xc0 0x53 0x1e 0xb8
Revised our id addr:
0x8f 0x6e 0x93 0x0a 0xdc 0x76
host tx hci data; handle=0 length=12
ble_hs_hci_acl_tx(): 0x00 0x00 0x0c 0x00 0x08 0x00 0x06 0x00 0x09 0x00 0x26 0x24 0xcc 0xa4 0xae 0x30
Number of Completed Packets: num_handles=1
handle:0 pkts:1
Number of Completed Packets: num_handles=1
handle:0 pkts:1
ble_hs_hci_evt_acl_process(): conn_handle=0 pb=2 len=21 data=0x11 0x00 0x06 0x00 0x08 0x9e 0xb4 0x85 0x05 0x6a 0x41 0xd3 0x62 0xde 0x00 0xf9 0xfe 0x35 0xee 0xbc 0xf5
rxed sm command: id info; conn=0 irk=0x9e 0xb4 0x85 0x05 0x6a 0x41 0xd3 0x62 0xde 0x00 0xf9 0xfe 0x35 0xee 0xbc 0xf5
rx_key_flags=0x08
ble_hs_hci_evt_acl_process(): conn_handle=0 pb=2 len=12 data=0x08 0x00 0x06 0x00 0x09 0x00 0x39 0x15 0x7b 0x45 0x5f 0xe0
rxed sm command: id addr info; conn=0 addr_type=0 addr=e0:5f:45:7b:15:39
rx_key_flags=0x00
encryption change event; status=0 Revised our id addr:
0x8f 0x6e 0x93 0x0a 0xdc 0x76
handle=0 our_ota_addr_type=1 our_ota_addr=76:dc:0a:93:6e:8f our_id_addr_type=0 our_id_addr=30:ae:a4:cc:24:26 peer_ota_addr_type=1 peer_ota_addr=61:1c:b8:fd:c4:b5 peer_id_addr_type=1 peer_id_addr=61:1c:b8:fd:c4:b5 conn_itvl=24 conn_latency=0 supervision_timeout=72 encrypted=1 authenticated=1 bonded=1

Disconnect:

disconnect; reason=531 handle=0 our_ota_addr_type=1 our_ota_addr=76:dc:0a:93:6e:8f our_id_addr_type=0 our_id_addr=30:ae:a4:cc:24:26 peer_ota_addr_type=1 peer_ota_addr=61:1c:b8:fd:c4:b5 peer_id_addr_type=0 peer_id_addr=e0:5f:45:7b:15:39 conn_itvl=24 conn_latency=0 supervision_timeout=72 encrypted=1 authenticated=1 bonded=1

Notice the change of peer_id_addr and peer_id_type.

Reconnect:

LE connection complete. handle=0 role=1 paddrtype=1 addr=4d.6c.15.67.f2.c8 local_rpa=0.0.0.0.0.0 peer_rpa=0.0.0.0.0.0 itvl=24 latency=0 spvn_tmo=72 mca=1
0xc8 0xf2 0x67 0x15 0x6c 0x4d connection established; status=0 Revised our id addr:
0x8f 0x6e 0x93 0x0a 0xdc 0x76
handle=0 our_ota_addr_type=1 our_ota_addr=76:dc:0a:93:6e:8f our_id_addr_type=0 our_id_addr=30:ae:a4:cc:24:26 peer_ota_addr_type=1 peer_ota_addr=4d:6c:15:67:f2:c8 peer_id_addr_type=0 peer_id_addr=e0:5f:45:7b:15:39 conn_itvl=24 conn_latency=0 supervision_timeout=72 encrypted=0 authenticated=0 bonded=0

ble_hs_hci_cmd_send: ogf=0x08 ocf=0x0016 len=2
0x16 0x20 0x02 0x00 0x00
Command Status: status=0 cmd_pkts=5 ocf=0x16 ogf=0x8
LE Remote Used Features. handle=0 feat=dd 00 00 00 00 00 00 00
LE LTK Req. handle=0 rand=00 encdiv=0
Revised our id addr:
0x8f 0x6e 0x93 0x0a 0xdc 0x76
looking up our sec; peer_addr_type=0 peer_addr=0x39 0x15 0x7b 0x45 0x5f 0xe0  ediv=0x00 rand=0x0
ble_hs_hci_cmd_send: ogf=0x08 ocf=0x001a len=18
0x1a 0x20 0x12 0x00 0x00 0xda 0xaf 0x8d 0x30 0xb9 0xf7 0x2a 0x57 0xba 0xee 0xcd 0x73 0x82 0x99 0xac 0xe4
Command complete: cmd_pkts=5 ogf=0x8 ocf=0x1a status=0 handle=0
Encrypt change: status=0 handle=0 state=1
encryption change event; status=0 Revised our id addr:
0x8f 0x6e 0x93 0x0a 0xdc 0x76
handle=0 our_ota_addr_type=1 our_ota_addr=76:dc:0a:93:6e:8f our_id_addr_type=0 our_id_addr=30:ae:a4:cc:24:26 peer_ota_addr_type=1 peer_ota_addr=4d:6c:15:67:f2:c8 peer_id_addr_type=0 peer_id_addr=e0:5f:45:7b:15:39 conn_itvl=24 conn_latency=0 supervision_timeout=72 encrypted=1 authenticated=1 bonded=1

For completeness here is the code for app_main() and on_sync() I used:

static void
bleprph_on_sync(void)
{
    int rc;
ble_hs_pvcy_rpa_config(true);
    rc = ble_hs_util_ensure_addr(1);
    assert(rc == 0);

    /* Figure out address to use while advertising (no privacy for now) */
    rc = ble_hs_id_infer_auto(1, &own_addr_type);
    if (rc != 0) {
        MODLOG_DFLT(ERROR, "error determining address type; rc=%d\n", rc);
        return;
    }

    /* Printing ADDR */
    uint8_t addr_val[6] = {0};
    rc = ble_hs_id_copy_addr(own_addr_type, addr_val, NULL);

    MODLOG_DFLT(INFO, "Device Address: ");
    print_addr(addr_val);
    MODLOG_DFLT(INFO, "\n");
    /* Begin advertising. */
    bleprph_advertise();
}

void bleprph_host_task(void *param)
{
    ESP_LOGI(tag, "BLE Host Task Started");
    /* This function will return only when nimble_port_stop() is executed */
    nimble_port_run();

    nimble_port_freertos_deinit();
}

void
app_main(void)
{
    int rc;

    /* Initialize NVS — it is used to store PHY calibration data */
    esp_err_t ret = nvs_flash_init();
    if (ret == ESP_ERR_NVS_NO_FREE_PAGES || ret == ESP_ERR_NVS_NEW_VERSION_FOUND) {
        ESP_ERROR_CHECK(nvs_flash_erase());
        ret = nvs_flash_init();
    }
    ESP_ERROR_CHECK(ret);

    ESP_ERROR_CHECK(esp_nimble_hci_and_controller_init());

    nimble_port_init();
    /* Initialize the NimBLE host configuration. */
    ble_hs_cfg.reset_cb = bleprph_on_reset;
    ble_hs_cfg.sync_cb = bleprph_on_sync;
    ble_hs_cfg.gatts_register_cb = gatt_svr_register_cb;
    ble_hs_cfg.store_status_cb = ble_store_util_status_rr;

    ble_hs_cfg.sm_io_cap = BLE_HS_IO_DISPLAY_ONLY;
//#ifdef CONFIG_EXAMPLE_BONDING
    ble_hs_cfg.sm_bonding = 1;
//#endif
//#ifdef CONFIG_EXAMPLE_MITM
    ble_hs_cfg.sm_mitm = 1;
//#endif
//#ifdef CONFIG_EXAMPLE_USE_SC
    ble_hs_cfg.sm_sc = 1;
//#else
//    ble_hs_cfg.sm_sc = 0;
//#ifdef CONFIG_EXAMPLE_BONDING
    ble_hs_cfg.sm_our_key_dist = 3;
    ble_hs_cfg.sm_their_key_dist = 3;
//#endif
//#endif

    rc = gatt_svr_init();
    assert(rc == 0);

    /* Set the default device name. */
    rc = ble_svc_gap_device_name_set("nimble-bleprph");
    assert(rc == 0);

    /* XXX Need to have template for store */
    ble_store_config_init();

    nimble_port_freertos_init(bleprph_host_task);

 //   ble_hs_pvcy_rpa_config(true);
    /* Initialize command line interface to accept input from user */
    rc = scli_init();
    if (rc != ESP_OK) {
        ESP_LOGE(tag, "scli_init() failed");
    }
}

One last suggestion is to use make (or idf.py if on IDF 4+) erase_flash before flashing your device. The old bond data can mess with things I have found. Also delete the bonds from your phone before trying again.

Edit: correct log, fix bad edit 😄

Hi i tried this solution it still doesn't work for me. I'm using nimble v1.3.0